Back to Blog
AI Engineering

Faizan Mohammed on LangChain: Building RAG Pipelines, AI Agents & Production LLM Apps on GitHub

If you landed here searching for "Faizan Mohammed LangChain GitHub", you're in the right place. I'm Muhammad Faizan — AI engineer, certified ethical hacker, and full-stack developer. This post walks through my LangChain projects, the architecture decisions I've made in production, and why LangChain remains my go-to framework for building LLM-powered applications.

Who Is Faizan Mohammed (Faizzyhon)?

I'm Muhammad Faizan, known online as Faizzyhon — an AI engineer and cybersecurity specialist based in Pakistan. My GitHub repositories span LangChain-powered chatbots, RAG (Retrieval-Augmented Generation) pipelines, deep learning projects, and full-stack SaaS applications. I hold certifications in ethical hacking and specialise in building intelligent systems that combine security-first design with practical AI capabilities.

My LangChain work sits at the intersection of language model engineering, API design, and production deployment. The projects I've built include a context-aware conversational assistant, document retrieval pipelines, and custom agent toolchains — all available to explore through this site and my public repositories.

🤝 Looking to hire a LangChain developer?

I'm available for LangChain consulting, RAG pipeline builds, and AI agent development. Typical engagements run 2–12 weeks.

Get in Touch →

Why LangChain? A Developer's Honest Take

LangChain emerged as the dominant Python framework for LLM application development because it solves the two hardest problems in production AI: chaining language model calls with deterministic logic, and grounding model responses in real data through retrieval. Before LangChain, building a production chatbot that could reliably reference your own documents required significant custom plumbing. LangChain abstractes that plumbing into composable primitives.

That said, LangChain isn't magic. The quality of your application still depends on prompt engineering, your choice of embedding model, your vector store's retrieval configuration, and how you structure the conversational memory. I've found the framework most valuable when you treat its abstractions as a starting point rather than a ceiling — understanding what's happening under the hood makes debugging exponentially faster.

My LangChain AI Conversational Assistant — Architecture Breakdown

The flagship project on my GitHub is an AI Conversational Assistant built with LangChain, deployed via a Flask API backend with a React frontend. Here's how the architecture works:

1. Document Ingestion Pipeline

Source documents (PDFs, Markdown files, web scraped content) are loaded using LangChain's DocumentLoader classes. Each document is split into overlapping chunks using RecursiveCharacterTextSplitter — chunk size 512 tokens, overlap 64 — to preserve context across chunk boundaries. This overlap is critical: without it, a question whose answer spans a chunk boundary returns incomplete context.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=64,
    separators=["\n\n", "\n", ". ", " ", ""]
)
chunks = splitter.split_documents(raw_docs)

2. Embedding & Vector Store

Chunks are embedded using OpenAI's text-embedding-3-small model (1536 dimensions) and stored in a FAISS index locally for development, with a Pinecone index for production. FAISS gives sub-millisecond retrieval on datasets up to a few million vectors without infrastructure overhead. For larger corpora or multi-tenant applications, Pinecone's managed service becomes worth the cost.

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_documents(chunks, embeddings)
vectorstore.save_local("faiss_index")

3. Retrieval Chain with Conversational Memory

The core of the assistant is a ConversationalRetrievalChain that combines a retriever (top-4 similarity search) with a ConversationBufferWindowMemory (last 8 turns). This gives the model access to both relevant document context and recent conversation history simultaneously — which is what distinguishes a useful assistant from a one-shot Q&A system.

from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferWindowMemory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.2)
memory = ConversationBufferWindowMemory(
    memory_key="chat_history",
    return_messages=True,
    k=8
)
chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
    memory=memory,
    verbose=False
)

4. Flask API & React Frontend

The chain is served via a Flask API with streaming support using Server-Sent Events (SSE). Streaming is non-negotiable for production chatbots — users expect to see tokens appear progressively rather than wait 3–8 seconds for a complete response. The React frontend connects to the SSE endpoint and renders tokens incrementally using a simple state reducer.

RAG Pipeline Patterns I Use in Production

After building multiple RAG systems, a few patterns consistently produce better results than the naive similarity-search approach:

Hybrid Search (BM25 + Dense Retrieval)

Pure dense retrieval (vector similarity) struggles with exact keyword matches — product SKUs, names, error codes. BM25 (sparse lexical search) handles these well but fails on semantic similarity. A hybrid retriever that scores both and normalises results using Reciprocal Rank Fusion (RRF) consistently outperforms either alone. LangChain's EnsembleRetriever makes this straightforward to implement.

HyDE (Hypothetical Document Embeddings)

For queries where the question is phrased very differently from how documents are written, I use HyDE: generate a hypothetical answer to the question, then embed that answer and use it as the retrieval query. This dramatically improves recall on technical documentation where user questions use natural language but documents use domain terminology.

Re-Ranking with a Cross-Encoder

Bi-encoder retrieval (standard FAISS similarity search) is fast but imprecise. A cross-encoder re-ranker (e.g., cross-encoder/ms-marco-MiniLM-L-6-v2 via the sentence-transformers library) re-scores the top-20 retrieved chunks and returns the best 4. The latency increase is ~100ms but precision improves significantly, especially on complex queries.

LangChain Agents: When Chains Aren't Enough

RAG chains work well for Q&A over static documents. For tasks that require planning, tool use, or multi-step reasoning, LangChain Agents are the right abstraction. An agent uses the LLM to decide which tool to call next based on previous observations — enabling behaviours like "search the web, check the database, then format a report" in a single interaction.

I've built agents that combine:

  • Custom Python tools — functions decorated with @tool that execute code, query databases, or call external APIs
  • DuckDuckGo search — for current information beyond the training cutoff
  • Structured output parsing — forcing the agent to return validated Pydantic models rather than raw text
  • Human-in-the-loop — pause points where the agent requests confirmation before executing irreversible actions

LangGraph (LangChain's newer graph-based agent framework) gives even more control over multi-agent workflows and is now my default for complex agentic systems.

Other AI Projects on My GitHub

Brain Tumor Detection (TensorFlow + OpenCV)

A convolutional neural network trained to detect and classify brain tumours from MRI scans, benchmarked on the BraTS dataset. Built with TensorFlow 2.x, using transfer learning from a ResNet-50 backbone with custom classification heads. Real-time inference via a lightweight Flask endpoint.

TaxLance — AI-Assisted SaaS for Freelancers

A tax and billing SaaS platform for Pakistani freelancers, integrating Stripe for payments and using LLM-powered invoice categorisation to auto-assign expense types. Built with React frontend and a Python/FastAPI backend.

Tools Built Alongside These Projects

Alongside AI development, I've shipped a suite of free browser-based developer tools — no server, no tracking, no account required. These are useful day-to-day utilities I use myself:

How to Explore My GitHub

The best way to explore my projects is to start with the main profile, browse the pinned repositories, and look at the README files for architecture diagrams and setup instructions. Most LangChain projects include a requirements.txt, environment variable documentation, and a working demo flow so you can run the system locally in under five minutes.

If you're evaluating me for a role or engagement and have specific questions about implementation decisions, I'm always happy to walk through code on a call. Reach out via the contact page or connect on LinkedIn.

What I'm Building Next with LangChain & LangGraph

Current focus areas in my AI work:

  • Agentic security tools — combining my cybersecurity background with LLM agents to automate parts of the penetration testing workflow
  • Multimodal RAG — extending document retrieval to include image content, charts, and tables using GPT-4V and Claude's vision API
  • Structured extraction pipelines — using LangChain's output parsers and Instructor library to turn unstructured documents into validated database records
  • Self-hosted LLMs — deploying Llama 3 and Mistral via Ollama for use cases where data privacy precludes cloud API calls

If this work aligns with what you're building, get in touch. I take on consulting work and longer-term engineering engagements where I can meaningfully contribute to the architecture and outcome.

More Articles