LangChain isn't the simplest way to build with LLMs. But it's the most flexible. Here's how to get from pip install to a working chain in 20 minutes — without drowning in abstractions.

What You'll Build

A chain that takes a user's question, searches a PDF document for context, and answers using that context. It's the core pattern behind most production RAG apps.

Prerequisites

  • A PDF file to query (any tech report works)

Step 1: Install the Right Packages

``bash

pip install langchain langchain-openai langchain-community

pip install pypdf chromadb # for PDF loading and vector storage

`

Common mistake: Installing just langchain gets you the core framework but no model integrations or document loaders. LangChain is modular by design. You install what you need.

Step 2: Load and Chunk a PDF

`python

from langchain_community.document_loaders import PyPDFLoader

from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = PyPDFLoader("report.pdf")

docs = loader.load()

splitter = RecursiveCharacterTextSplitter(

chunk_size=500,

chunk_overlap=50

)

chunks = splitter.split_documents(docs)

`

The chunk_overlap prevents context loss at boundaries. Without it, a sentence split across chunks becomes unreadable to the model.

Step 3: Store in a Vector Database

`python

from langchain_openai import OpenAIEmbeddings

from langchain_community.vectorstores import Chroma

embeddings = OpenAIEmbeddings()

vectorstore = Chroma.from_documents(

documents=chunks,

embedding=embeddings,

persist_directory="./chroma_db"

)

`

Chroma is local and free. For production, swap to Pinecone, Weaviate, or Qdrant.

Step 4: Build the Retrieval Chain

`python

from langchain_openai import ChatOpenAI

from langchain.chains import RetrievalQA

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

qa_chain = RetrievalQA.from_chain_type(

llm=llm,

retriever=retriever,

return_source_documents=True

)

response = qa_chain.invoke({"query": "What are the key findings?"})

print(response["result"])

print("Sources:", [d.page_content[:100] for d in response["source_documents"]])

`

The k=3 parameter controls how many chunks the retriever fetches. Too few = missing context. Too many = token waste and confusion.

Step 5: Add Memory (Optional but Recommended)

`python

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(

memory_key="chat_history",

return_messages=True

)

from langchain.chains import ConversationalRetrievalChain

chat_chain = ConversationalRetrievalChain.from_llm(

llm=llm,

retriever=retriever,

memory=memory

)

result = chat_chain.invoke({"question": "Summarize the report"})

result = chat_chain.invoke({"question": "What did you just say?"}) # remembers

`

Without memory, every question is independent. The model forgets what you asked 10 seconds ago.

Step 6: Handle Errors Gracefully

Production chains fail. APIs timeout. PDFs are corrupted. Build error handling from day one:

`python

from langchain_core.runnables import RunnableConfig

try:

response = qa_chain.invoke({"query": "What are the key findings?"})

except Exception as e:

print(f"Chain failed: {e}")

response = {"result": "Unable to process. Please try a different document."}

`

Always wrap chain invocations in try-except blocks. Users prefer a graceful error message over a stack trace.

What to Do Next

  • Deploy — Wrap in FastAPI, containerize, and ship

Step 7: Evaluate Your Chain Before Production

Before shipping, test with questions you know the answers to:

`python

test_questions = [

"What is the main conclusion?",

"What methodology was used?",

"What are the limitations?"

]

for q in test_questions:

result = qa_chain.invoke({"query": q})

print(f"Q: {q}\nA: {result['result'][:200]}...\n")

`

If the answers are wrong or generic, your chunking strategy needs adjustment. Increase chunk_size, add more chunk_overlap, or switch to a better embedding model.

What's Still Hard

Version chaos. LangChain releases breaking changes monthly. Code that works in 0.1.x fails in 0.2.x. Pin your versions in production or expect surprises.

Debugging chains is opaque. When a chain returns garbage, you can't step through it. You have to add verbose=True` and read a wall of text. LangSmith helps but costs extra.

Retrieval quality is the bottleneck. A bad chunking strategy or poor embeddings will sink your app faster than a bad prompt. Most beginners obsess over the LLM when the vector search is the real problem.

Deployment complexity. A chain that works locally fails in production because of timeout limits, memory constraints, or concurrent access. Containerize early and test with realistic load.

Related Reading

The Bottom Line

LangChain is overkill for simple "call GPT-4" scripts. But once you need retrieval, memory, or tool use, it pays for itself. Start with a basic chain. Add complexity only when you hit limits.

The goal isn't to master every abstraction. It's to ship something that answers real questions using real documents.