You've learned how to call LLMs directly. You've built prompt templates and basic chains. But what if your AI needs to think, plan, and act? That's where agents come in.

In this article, we're building a real agent that can reason about problems, use tools, and course-correct when things go wrong. We'll use LangChain's ReAct framework, add web search and code execution capabilities, and trace everything through LangSmith so you can see exactly what's happening inside the black box.

By the end, you'll have a working multi-tool agent and the knowledge to deploy it safely in production.

What Is an AI Agent, Really?

Before we write a single line of code, let's get clear on what we mean when we say "AI agent." The term gets thrown around a lot, and it's worth understanding exactly what separates an agent from a regular LLM call or a simple chain.

An AI agent is a system that takes a goal, autonomously decides what steps to take to achieve it, executes those steps using available tools or actions, observes the results, and adapts its plan accordingly. That last part, adapting, is what makes it fundamentally different from everything you've built so far. A chain runs a predetermined sequence. An agent invents its own sequence on the fly, based on what it discovers at each step.

Think about the difference between following a recipe and being asked to cook dinner with whatever is in the fridge. The recipe is a chain: fixed steps, fixed order. The improvised meal is agentic reasoning: you open the fridge, assess what's there, decide what to make, discover you're missing an ingredient, substitute something else, and adjust your technique based on what's actually happening on the stove.

This capability unlocks an entirely new class of applications. Research assistants that actually browse sources and synthesize findings rather than hallucinating from training data. Code debugging agents that read error messages, identify the cause, write a fix, run the tests, and iterate until they pass. Customer support bots that can look up account information, check order status, process refunds, and escalate to a human when the situation calls for it, all in a single conversation. Data analysis agents that can write SQL, execute it against a live database, interpret the results, generate charts, and produce a written summary without you having to specify every step.

The key insight is that agents are not just "smarter LLMs." They are LLMs equipped with a feedback loop and a set of tools. The LLM provides the reasoning; the tools provide the ability to affect and observe the real world. Put those two things together and you get something that can genuinely solve problems rather than just describe solutions.

LangChain is currently the most mature framework for building agents in Python. It provides standardized abstractions for tools, memory, and execution, plus a rich ecosystem of pre-built integrations. Let's see how it all fits together.

Why Agents? The Limits of Chains

A chain is a fixed sequence: prompt → LLM → parser → output. It's rigid. Useful, but rigid.

An agent is different. It's a loop: the LLM looks at a problem, decides what tool to use, calls the tool, observes the result, and decides what to do next. If the tool didn't give what it needed, it tries again.

Think about how you'd search for "best Python async libraries in 2026":

You'd search Google.
Look at results.
Click a few links.
Maybe refine your search.
Synthesize what you found.
Summarize.

That's agentic reasoning. A chain would just format a prompt and send it once. An agent acts in a loop until it solves the problem.

This matters for:

Research tasks where you need to verify facts across sources
Complex problem-solving where the path isn't predetermined
Handling errors gracefully when a tool fails
Multi-step reasoning where each step depends on the last

Chains are deterministic pipelines. Agents are exploratory reasoning loops. Let's build one.

Agent Architecture Patterns

Before you write your first agent, it helps to understand the major architectural patterns in the field. Not all agents are built the same way, and choosing the right architecture for your use case makes the difference between a system that works and one that frustrates users.

The most widely used pattern is ReAct (Reasoning + Acting), which we'll implement in detail shortly. The LLM alternates between generating a thought about what to do next and taking an action by calling a tool. This cycle continues until the model is confident it has enough information to produce a final answer. ReAct is excellent for research and question-answering tasks because the model can explore multiple angles before committing to a response.

A second pattern is Plan-and-Execute, where the agent first generates a complete multi-step plan, then executes each step in sequence, checking off items as it goes. This architecture shines for complex tasks with predictable structure, like generating a full report or executing a deployment pipeline. The upside is predictability and easier auditing; the downside is that if an early step fails, the entire plan may need to be reconsidered.

Multi-Agent Orchestration is a third pattern that has become increasingly popular as agent systems grow more complex. Instead of one agent with many tools, you build a coordinator agent that delegates sub-tasks to specialized worker agents. A research coordinator might delegate to a web-search agent, a data-analysis agent, and a writing agent, then assemble the results. This pattern gives you cleaner separation of concerns and allows each sub-agent to be optimized for its specific task, but it adds coordination overhead and is harder to debug.

Finally, there is the Reflection pattern, where an agent critiques its own outputs before returning them. After generating a first draft answer, it passes that draft to a critic prompt (sometimes the same LLM, sometimes a separate one) that identifies weaknesses, then the generator revises based on the critique. This dramatically improves output quality at the cost of additional LLM calls and latency.

For most practical use cases, start with ReAct. It is the simplest, most well-understood architecture with the largest body of community knowledge and tooling support. Graduate to more complex patterns only when you have a specific need that ReAct cannot satisfy.

LCEL: The Foundation

Before we talk agents, you need to understand LangChain's Expression Language (LCEL). It's how modern LangChain chains compose.

LCEL lets you pipe components together cleanly. The elegance here is not just syntactic, LCEL chains are lazily evaluated, support streaming out of the box, and enable parallel execution of branches when components don't depend on each other. That means you get better performance without writing any concurrency code yourself.

python

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
 
# Define components
model = ChatOpenAI(model="gpt-4", temperature=0)
prompt = ChatPromptTemplate.from_template("Explain {topic} in 2 sentences.")
parser = StrOutputParser()
 
# Compose with pipes
chain = prompt | model | parser
 
# Run it
result = chain.invoke({"topic": "agents"})
print(result)

The | operator chains components. Data flows left to right. This pattern is everywhere in modern LangChain code.

Every component in an LCEL chain exposes the same interface, invoke, stream, batch, and ainvoke for async, so you can swap pieces without rewriting surrounding code. Swap ChatOpenAI for ChatAnthropic and the rest of your chain keeps working. That composability is why LCEL matters for production systems, where you will inevitably want to experiment with different models and parsers without rebuilding from scratch.

Why does this matter for agents? Because agents themselves are chains. They're a loop composition: observing state → deciding action → executing tool → observing result → repeating.

Chat Models, Prompts, and Structured Output

Agents rely on the LLM's reasoning ability. You'll typically use a strong chat model (GPT-4, Claude, etc.), and you'll give it structured information about available tools.

The system prompt is where you define the agent's persona, its operating principles, and its relationship to the tools it has access to. A well-written system prompt is often the difference between an agent that confidently solves problems and one that gets confused, loops endlessly, or uses tools unnecessarily. Spend time on this. It matters more than most developers expect.

python

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
 
# 1. Initialize a good reasoning model
model = ChatOpenAI(model="gpt-4-turbo", temperature=0)
 
# 2. Build a system prompt that teaches the model about tools
system_prompt = """You are a helpful assistant with access to tools.
You can search the web and run Python code.
 
When you need information or want to execute code:
1. Think about what tool to use
2. Call the tool with clear parameters
3. Read the result carefully
4. Decide if you need more steps or if you're done
 
Be concise. Use tools when needed, not for everything."""
 
# 3. Create a prompt template with space for past messages
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])

Notice MessagesPlaceholder? That's how agents maintain conversation history. The agent appends every step of its reasoning to this list, so the model sees the full context of what it's tried. Without this, the model would be amnesiac between tool calls, unable to learn from what it already tried. The placeholder is the mechanism that turns a stateless LLM into a stateful reasoning engine.

Tools and Function Calling

Tools are the bridge between your agent's reasoning and the real world. Without tools, an agent is just an LLM that narrates what it would do. With tools, it actually does it. Understanding how to design good tools is one of the most important skills in agent engineering, and it is frequently underestimated by developers who focus too much on prompt engineering and not enough on the tool interface design.

Every tool is essentially a structured API endpoint for the LLM. The model reads the tool's name, description, and parameter schema to decide whether to call it and how to format the input. This means your docstrings and type hints are not just documentation, they are functional parts of your system that directly influence agent behavior. A poorly written docstring results in a tool that gets called at the wrong time or with malformed inputs.

When designing tools, follow three rules. First, keep each tool narrowly scoped: one tool should do one thing. A tool that can search the web, read files, and send emails will confuse the agent about when to use it. Second, make failures informative: instead of raising an exception that crashes the agent, return a descriptive error string so the model can reason about what went wrong and try a different approach. Third, design for the model, not for a human programmer: the tool name and description should describe the tool's purpose in plain language that a language model can interpret, not in technical jargon.

python

from langchain_core.tools import tool
from typing import Optional
import requests
import subprocess
 
# Tool 1: Web search
@tool
def web_search(query: str) -> str:
    """Search Google for information about a topic.
 
    Args:
        query: The search query
 
    Returns:
        Search results as a string
    """
    # In reality, use SerpAPI or similar
    url = "https://www.google.com/search"
    params = {"q": query}
    try:
        response = requests.get(url, params=params, timeout=5)
        # Parse and return snippets (simplified)
        return f"Search results for '{query}': Found relevant sources..."
    except Exception as e:
        return f"Search failed: {str(e)}"
 
# Tool 2: Run Python code
@tool
def execute_code(code: str) -> str:
    """Execute Python code and return the output.
 
    Args:
        code: Python code to execute (single expression or simple script)
 
    Returns:
        Output from the code
    """
    try:
        result = eval(code)
        return str(result)
    except Exception as e:
        # Use subprocess for more complex code
        try:
            output = subprocess.check_output(
                ["python", "-c", code],
                text=True,
                timeout=10,
                stderr=subprocess.STDOUT
            )
            return output
        except Exception as e:
            return f"Code execution failed: {str(e)}"
 
# Tool 3: File operations
@tool
def read_file(filepath: str) -> str:
    """Read a text file from disk.
 
    Args:
        filepath: Full path to the file
 
    Returns:
        File contents
    """
    try:
        with open(filepath, 'r') as f:
            return f.read()
    except FileNotFoundError:
        return f"File not found: {filepath}"
 
# Collect all tools
tools = [web_search, execute_code, read_file]

Notice how every error path returns a string rather than raising. The agent reads that string ("Search failed: connection timeout") and can decide to try again with a different query or tell the user there was a connectivity issue. That graceful failure handling is what makes the difference between agents that feel robust and agents that mysteriously stop mid-task.

Each tool:

Has a docstring that the agent reads to understand what it does
Has type hints for parameters (helps the model format inputs correctly)
Returns a string (the agent expects string results)
Can fail gracefully and tell the agent what went wrong

The agent reads these docstrings and decides when to call which tool. That's why clear documentation matters.

ReAct Agents: Thought → Action → Observation

The ReAct framework (Reasoning + Acting) is the gold standard for agent design. The loop is simple:

Thought: The model reasons about the problem
Action: The model decides which tool to call
Observation: The tool returns a result
Repeat until the model says "Final Answer"

The genius of ReAct is that it forces the model to reason explicitly before each action. By generating a "Thought" step, essentially talking through its reasoning, the model is less likely to jump to a conclusion and more likely to call the right tool with the right parameters. That intermediate reasoning also gives you something to inspect when debugging: if the agent makes a wrong decision, you can trace it back to a flawed thought and adjust your system prompt or tool descriptions accordingly.

python

from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain_openai import ChatOpenAI
 
# Initialize model and tools (from previous section)
model = ChatOpenAI(model="gpt-4-turbo", temperature=0)
tools = [web_search, execute_code, read_file]
 
# Create the agent
agent = create_tool_calling_agent(
    llm=model,
    tools=tools,
    prompt=prompt  # The prompt we defined earlier
)
 
# Wrap it with an executor that handles the loop
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,  # Print each step (great for debugging)
    max_iterations=10,  # Prevent infinite loops
    handle_parsing_errors=True
)
 
# Run the agent
response = executor.invoke({
    "input": "Search for best Python async libraries and summarize the top 3",
    "chat_history": []
})
 
print(response["output"])

When you run this, you'll see output like:

> Entering new AgentExecutor...

Thought: I need to search for Python async libraries to answer this question.

Action: web_search
Action Input: {"query": "best Python async libraries 2024"}

Observation: Search results for 'best Python async libraries'...

Thought: I got results. Let me process and summarize them.

Final Answer: The top 3 Python async libraries are...

Each Thought → Action → Observation cycle is one iteration. The executor keeps going until the model outputs "Final Answer" or hits max_iterations. The max_iterations guard is not optional, without it, a confused agent can loop indefinitely, burning through API credits while accomplishing nothing. Ten iterations is a sensible default for most tasks; increase it only if you have a specific, well-understood use case that requires more steps.

Memory and Context Management

Memory is what separates a one-shot tool from a genuine assistant. Without memory, every message is a fresh start: the agent has no idea what you discussed a moment ago, cannot build on previous answers, and cannot maintain any continuity across a conversation. With memory, you get an agent that feels coherent and useful over time.

LangChain provides several memory strategies, and choosing the right one is a meaningful architectural decision. The core tradeoff is between completeness and cost. More complete memory (keeping every message verbatim) gives the agent more context to work with, but it also means longer prompts, higher token costs, and eventually hitting the model's context window limit. Compressed or summarized memory reduces costs but risks losing details the agent needs.

For most conversational agents, buffer memory with a token or message limit is the right starting point. It is simple, predictable, and cheap. Move to summary memory when you expect conversations to be very long, customer support sessions, extended research dialogues, or multi-session workflows where a user might return after hours or days.

python

from langchain.memory import ConversationBufferMemory, ConversationSummaryMemory, ConversationTokenBufferMemory
from langchain_openai import ChatOpenAI
 
# Option 1: Buffer memory (simple, stores all messages)
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True  # Return as Message objects
)
 
# Option 2: Summary memory (compress old messages)
summary_memory = ConversationSummaryMemory(
    llm=ChatOpenAI(model="gpt-3.5-turbo"),
    memory_key="chat_history",
    return_messages=True
)
 
# Option 3: Token-limited window (keep last ~2048 tokens worth)
window_memory = ConversationTokenBufferMemory(
    llm=ChatOpenAI(model="gpt-3.5-turbo"),
    memory_key="chat_history",
    return_messages=True,
    max_token_limit=2048
)

For agents, you typically use buffer memory (simpler, all history) or window memory (recent history only, cheaper). Summary memory is slower but compresses old conversations elegantly.

One thing that surprises many developers: the agent's intermediate reasoning steps, its thoughts and observations, are also stored in memory during a run. This is intentional. The model needs to see its own prior steps to avoid repeating them or contradicting itself. What you manage externally (via the memory object) is the between-turn history. What the executor manages internally is the within-turn scratchpad. Keep these two distinct in your mental model.

python

from langchain.agents import AgentExecutor
 
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=memory,
    verbose=True,
    max_iterations=10
)
 
# First turn
response1 = executor.invoke({
    "input": "What's the capital of France?"
})
 
# Second turn (memory is preserved)
response2 = executor.invoke({
    "input": "What's its population?"
})
# The agent can refer back: "You asked about France earlier..."

The memory automatically updates after each agent run, so multi-turn conversations work smoothly.

Output Parsers: From Strings to Structure

LLMs return text. But you often need structured data. Output parsers bridge that gap. This matters particularly for agents that feed their outputs into downstream systems, databases, APIs, UIs, where you need predictable data shapes rather than free-form prose.

The Pydantic-based parser is the most powerful option because it gives you both schema validation and automatic retry logic when the model's output doesn't conform to the expected format. Define your output schema as a Pydantic model, and LangChain handles the rest: it injects format instructions into your prompt, parses the response, and raises a clear error (or retries) if the output doesn't match.

python

from langchain_core.output_parsers import JsonOutputParser, PydanticOutputParser
from pydantic import BaseModel, Field
 
# Define what you want back (using Pydantic)
class ResearchSummary(BaseModel):
    topic: str = Field(description="The topic researched")
    key_findings: list[str] = Field(description="3-5 key points")
    sources: list[str] = Field(description="URLs of sources used")
    confidence: str = Field(description="High/Medium/Low")
 
# Create a parser
parser = PydanticOutputParser(pydantic_object=ResearchSummary)
 
# Use it in a chain
format_instructions = parser.get_format_instructions()
 
prompt = ChatPromptTemplate.from_template("""
Research the topic and provide findings.
 
{format_instructions}
 
Topic: {topic}
""")
 
chain = prompt | model | parser
 
result = chain.invoke({
    "topic": "asyncio in Python",
    "format_instructions": format_instructions
})
 
# result is now a ResearchSummary object, not a string
print(result.key_findings)

For agents, parsers ensure tool outputs are in a format you can work with. For example, if a tool returns JSON, parse it immediately:

python

@tool
def fetch_json_data(url: str) -> str:
    """Fetch JSON and return parsed data."""
    response = requests.get(url)
    data = response.json()
    # Parse and validate with Pydantic
    return JsonOutputParser().parse(str(data))

A practical tip: when using structured output parsers with agents, keep your Pydantic models as flat as possible. Deeply nested schemas increase the probability that the model will generate malformed JSON, which means more retries and higher latency. If you find yourself needing complex nested structures, consider breaking the agent's task into smaller steps where each step produces a simpler output.

LangSmith: Seeing Inside the Black Box

Agents are complex. You need visibility. LangSmith is LangChain's tracing and debugging platform.

Setup is simple, and it is one of the first things you should do when building any non-trivial agent. Flying blind, running agents without tracing, means that when something goes wrong (and it will), you have no way to understand why. You will spend hours adding print statements to instrument code that LangSmith would have traced automatically in seconds.

python

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"  # Get from langchain.com
os.environ["LANGCHAIN_PROJECT"] = "my-agents"
 
# Now run your agent normally
response = executor.invoke({
    "input": "Your question here",
    "chat_history": []
})

Every agent run is automatically traced. In the LangSmith dashboard, you'll see:

Full execution tree: Each thought, action, tool call, observation
Token usage: How many tokens each step consumed
Latency: Where time is spent
Errors: If something failed, exactly where and why
Input/Output: The full data at each step

This is invaluable for debugging. If an agent took a wrong turn, you can see exactly which tool call led to the wrong reasoning, and adjust your prompts or tools accordingly. More than that, LangSmith lets you build datasets from production traces: when a user asks a question your agent handles beautifully, you can save that trace as a golden example. When the agent fails, you save that trace as a regression test. Over time, you build a test suite grounded in real-world usage, which is far more valuable than synthetic benchmarks.

Production Deployments: Handling Reality

Agents are powerful, but they're not magic. In production, you need safeguards. The most common mistake we see from developers shipping their first agent is treating the happy path as the only path. Your agent will receive inputs you did not anticipate, call tools that time out, encounter rate limits on external APIs, and occasionally spiral into repetitive loops. Plan for all of this before you go live.

python

from langchain.agents import AgentExecutor
from langchain_openai import ChatOpenAI
 
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=memory,
    verbose=False,  # Don't spam logs in production
    max_iterations=10,  # Prevent infinite loops
    handle_parsing_errors=True,  # Gracefully handle bad output
    return_intermediate_steps=True,  # Useful for auditing
    # Timeout after 30 seconds
    max_execution_time=30
)
 
# Always wrap in error handling
try:
    response = executor.invoke({
        "input": user_input,
        "chat_history": memory.chat_memory.messages
    })
    return response["output"]
except Exception as e:
    # Log the error, return fallback
    logger.error(f"Agent failed: {e}")
    return "I encountered an issue. Please try again."

Key production considerations:

Rate limiting: Tools like web_search hit external APIs. Add backoff and retry logic.
Timeouts: Set both per-tool and per-agent timeouts so nothing hangs forever.
Cost control: Track token usage. Agents can be chatty; set budgets.
Safety: If tools have side effects (file writes, API calls), require explicit user approval or sandbox them.
Monitoring: Use LangSmith traces + application logs to catch issues early.
Fallbacks: Define what happens if an agent can't solve the problem.

Here's a production-hardened example:

python

from langchain_core.callbacks import StdOutCallbackHandler
from langchain.agents import AgentExecutor
from tenacity import retry, stop_after_attempt, wait_exponential
 
@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def run_agent(user_input: str, memory):
    """Run agent with retries and error handling."""
    try:
        response = executor.invoke({
            "input": user_input,
            "chat_history": memory.chat_memory.messages
        }, timeout=30)
        return response["output"]
    except TimeoutError:
        return "Request timed out. Please try a simpler question."
    except ValueError as e:
        # Tool parsing error
        logger.warning(f"Tool output parsing error: {e}")
        # Agent handles this gracefully with handle_parsing_errors=True
        raise

The tenacity retry decorator shown here is a pattern worth applying broadly in agent systems. LLM API calls and external tool calls are inherently unreliable, transient network errors, rate limits, and model provider outages are all facts of life. Rather than letting these failures surface as user-facing errors, wrap your agent execution in exponential backoff retry logic. Three attempts with exponential delay is a sensible default that handles most transient failures without introducing excessive latency.

Bringing It Together: A Complete Agent

Let's build one complete, working example from scratch. Everything we've discussed, tools, memory, prompts, error handling, comes together here. Read through this as a reference implementation you can adapt directly for your own projects.

python

import os
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.tools import tool
from langchain.agents import create_tool_calling_agent, AgentExecutor
from langchain.memory import ConversationBufferMemory
import requests
from datetime import datetime
 
# Setup
os.environ["OPENAI_API_KEY"] = "your-key"
 
@tool
def search_web(query: str) -> str:
    """Search the web for current information."""
    # Real implementation would use SerpAPI or DuckDuckGo
    return f"Search results for: {query} (simulated)"
 
@tool
def get_current_date() -> str:
    """Get today's date."""
    return datetime.now().strftime("%Y-%m-%d")
 
@tool
def calculate(expression: str) -> str:
    """Evaluate a math expression."""
    try:
        result = eval(expression)
        return str(result)
    except:
        return "Invalid expression"
 
# Build agent
model = ChatOpenAI(model="gpt-4-turbo", temperature=0)
tools = [search_web, get_current_date, calculate]
 
system_prompt = """You are a helpful assistant with web search, date, and calculator tools.
Use tools when helpful. Be concise and direct."""
 
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])
 
agent = create_tool_calling_agent(llm=model, tools=tools, prompt=prompt)
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
 
executor = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=memory,
    verbose=True,
    max_iterations=10,
    handle_parsing_errors=True
)
 
# Run it
if __name__ == "__main__":
    response = executor.invoke({
        "input": "What's 2024 + 37 and what's today's date?",
        "chat_history": []
    })
    print(response["output"])

Run this and you'll see the agent:

Recognize it needs the calculator tool
Call it with 2024 + 37
Recognize it needs the date tool
Call it
Synthesize both results

It's reasoning + acting, all in one loop. What makes this example instructive is not its complexity, it's actually quite simple, but its completeness. Every production agent you build will follow this same structure: tools with clear docstrings, a system prompt that explains the agent's role, a memory object to maintain conversation state, and an executor wrapped in error handling. The sophistication comes from the quality of your tools and prompts, not from exotic framework features.

Common Agent Mistakes

Even experienced developers make predictable mistakes when building agents for the first time. Knowing these pitfalls in advance saves you significant debugging time and helps you build systems that are reliable rather than impressive in demos but fragile in production.

The most common mistake is writing vague or overlapping tool descriptions. When two tools have similar descriptions, the agent will inconsistently choose between them or alternate randomly. Audit your tool docstrings before every deployment: each tool should have a unique, specific description that makes it obvious when to use it and when not to. If you cannot clearly articulate the difference between two tools in one sentence, they probably need to be redesigned.

A second frequent mistake is ignoring the token budget. Agents accumulate tokens rapidly: the initial prompt, the system message, the conversation history, intermediate reasoning steps, tool outputs, and the final answer all count against the context window. Developers often test with short conversations and short tool outputs, then discover in production that real usage blows through the context limit. Profile your token usage with LangSmith early and often. Set explicit limits on tool output length, truncate long search results, paginate file reads, and summarize API responses before feeding them back to the agent.

Over-relying on the agent's reasoning for tasks that should be hardcoded is a subtler but costly mistake. If you know the answer should always be retrieved from a specific database table, write a tool that queries that table directly rather than hoping the agent will figure out the right SQL to generate. Agents are best used for the parts of a task that are genuinely variable and require judgment. For the parts that are deterministic, write deterministic code. Mixing the two appropriately is what separates a well-engineered agent system from a fragile one.

Skipping structured output parsing is the fourth common mistake. When your agent's output needs to be consumed by another system, relying on free-form text creates a parsing problem downstream. Define Pydantic schemas for your agent's outputs from the start. It costs almost nothing to add and saves enormous grief when you need to process hundreds of agent responses consistently.

Finally, many developers fail to implement proper conversation state management for multi-session use cases. If your agent needs to remember something a user said last week, in-memory buffer storage is not going to work. You need a persistent memory store, Redis, a vector database, or a simple SQL table. Design this architecture before you write a single line of agent code; retrofitting persistent memory into an existing agent is much harder than building it in from the start.

Pitfall 1: Vague tool descriptions If your @tool docstring is unclear, the agent won't know when to use it.

Fix: Write clear, specific docstrings. Tell the agent what the tool does and when to use it.

Pitfall 2: Tools that are too broad A tool that does 10 different things confuses the agent.

Fix: Keep tools focused. One tool = one job.

Pitfall 3: Infinite loops The agent keeps trying the same thing over and over.

Fix: Set max_iterations and handle_parsing_errors=True. Log intermediate steps so you can see what went wrong.

Pitfall 4: Ignoring token costs Agents are chatty. Memory grows. Tokens add up fast.

Fix: Monitor token usage in LangSmith. Use window memory. Clean up old conversations.

Pitfall 5: Running agents for simple tasks A simple prompt is faster and cheaper than an agent loop.

Fix: Only use agents when they solve a genuinely complex, multi-step problem.

Key Takeaways

Agents are loops, not fixed pipelines. They think, act, observe, and repeat.
ReAct (Reasoning + Acting) is the standard framework. It's simple and effective.
Tools are how agents interact with the world. Write clear, focused tools.
Memory keeps conversation context. Use buffer or window memory.
LangSmith gives you visibility into agent behavior. Use it.
Production agents need timeouts, error handling, cost controls, and monitoring.
LCEL makes it all composable and readable.

Agents are one of the most exciting developments in applied AI. They turn LLMs from text predictors into problem-solvers. Start small, a web search agent, a calculator agent, a file-reading agent, and get comfortable with the ReAct loop before you build anything more ambitious. The loop itself is simple. The skill is in designing tools that give the agent genuinely useful capabilities, writing system prompts that guide its reasoning without over-constraining it, and building the infrastructure to monitor, debug, and iterate once it's in users' hands.

The developers shipping the most impressive agent systems right now are not the ones using the most complex architectures. They are the ones who deeply understand the ReAct loop, write excellent tool interfaces, trace every run in LangSmith, and iterate rapidly based on what they observe. That is a learnable, repeatable process, and you now have everything you need to start.

Next up: we're diving into Retrieval-Augmented Generation (RAG), where agents meet vector databases, and you learn to build AI that's grounded in your actual data rather than just its training knowledge. RAG is the natural complement to what you've built here: agents give you the ability to act; RAG gives you the ability to know. Together, they form the foundation of most serious production AI applications.

Building AI Agents with LangChain

What Is an AI Agent, Really?

Why Agents? The Limits of Chains

Agent Architecture Patterns

LCEL: The Foundation

Chat Models, Prompts, and Structured Output

Tools and Function Calling

ReAct Agents: Thought → Action → Observation

Memory and Context Management

Output Parsers: From Strings to Structure

LangSmith: Seeing Inside the Black Box

Production Deployments: Handling Reality

Bringing It Together: A Complete Agent

Common Agent Mistakes

Key Takeaways

Need help implementing this?