LangChain isn't the simplest way to build with LLMs. But it's the most flexible. Here's how to get from pip install to a working chain in 20 minutes — without drowning in abstractions.
What You'll Build
A chain that takes a user's question, searches a PDF document for context, and answers using that context. It's the core pattern behind most production RAG apps.
Prerequisites
- A PDF file to query (any tech report works)
Step 1: Install the Right Packages
``bash
pip install langchain langchain-openai langchain-community
pip install pypdf chromadb # for PDF loading and vector storage
`
Common mistake: Installing just langchain gets you the core framework but no model integrations or document loaders. LangChain is modular by design. You install what you need.
Step 2: Load and Chunk a PDF
`python
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = PyPDFLoader("report.pdf")
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50
)
chunks = splitter.split_documents(docs)
`
The chunk_overlap prevents context loss at boundaries. Without it, a sentence split across chunks becomes unreadable to the model.
Step 3: Store in a Vector Database
`python
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory="./chroma_db"
)
`
Chroma is local and free. For production, swap to Pinecone, Weaviate, or Qdrant.
Step 4: Build the Retrieval Chain
`python
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
return_source_documents=True
)
response = qa_chain.invoke({"query": "What are the key findings?"})
print(response["result"])
print("Sources:", [d.page_content[:100] for d in response["source_documents"]])
`
The k=3 parameter controls how many chunks the retriever fetches. Too few = missing context. Too many = token waste and confusion.
Step 5: Add Memory (Optional but Recommended)
`python
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
from langchain.chains import ConversationalRetrievalChain
chat_chain = ConversationalRetrievalChain.from_llm(
llm=llm,
retriever=retriever,
memory=memory
)
result = chat_chain.invoke({"question": "Summarize the report"})
result = chat_chain.invoke({"question": "What did you just say?"}) # remembers
`
Without memory, every question is independent. The model forgets what you asked 10 seconds ago.
Step 6: Handle Errors Gracefully
Production chains fail. APIs timeout. PDFs are corrupted. Build error handling from day one:
`python
from langchain_core.runnables import RunnableConfig
try:
response = qa_chain.invoke({"query": "What are the key findings?"})
except Exception as e:
print(f"Chain failed: {e}")
response = {"result": "Unable to process. Please try a different document."}
`
Always wrap chain invocations in try-except blocks. Users prefer a graceful error message over a stack trace.
What to Do Next
- Deploy — Wrap in FastAPI, containerize, and ship
Step 7: Evaluate Your Chain Before Production
Before shipping, test with questions you know the answers to:
`python
test_questions = [
"What is the main conclusion?",
"What methodology was used?",
"What are the limitations?"
]
for q in test_questions:
result = qa_chain.invoke({"query": q})
print(f"Q: {q}\nA: {result['result'][:200]}...\n")
`
If the answers are wrong or generic, your chunking strategy needs adjustment. Increase chunk_size, add more chunk_overlap, or switch to a better embedding model.
What's Still Hard
Version chaos. LangChain releases breaking changes monthly. Code that works in 0.1.x fails in 0.2.x. Pin your versions in production or expect surprises.
Debugging chains is opaque. When a chain returns garbage, you can't step through it. You have to add verbose=True` and read a wall of text. LangSmith helps but costs extra.
Retrieval quality is the bottleneck. A bad chunking strategy or poor embeddings will sink your app faster than a bad prompt. Most beginners obsess over the LLM when the vector search is the real problem.
Deployment complexity. A chain that works locally fails in production because of timeout limits, memory constraints, or concurrent access. Containerize early and test with realistic load.
Related Reading
The Bottom Line
LangChain is overkill for simple "call GPT-4" scripts. But once you need retrieval, memory, or tool use, it pays for itself. Start with a basic chain. Add complexity only when you hit limits.
The goal isn't to master every abstraction. It's to ship something that answers real questions using real documents.
Daily AI Intelligence, Free
Get AI news and analysis delivered to your inbox. No spam. Unsubscribe anytime.
One-click unsubscribe · We never share your data