SQLAlchemy ORM: Models, Sessions, and Queries

If you've spent any real time building Python applications that talk to a database, you already know the pain. Raw SQL queries scattered across your codebase, dictionaries where you wish you had objects, and that creeping dread every time you need to change a column name. Every. Single. String. Has. To. Change. SQLAlchemy exists to solve exactly this problem, and it does so without sacrificing the power and flexibility that makes Python such a joy to work with.
This article is your complete guide to SQLAlchemy's ORM layer in the 2.0 era. We're not just covering the mechanics, we're covering the why behind each design decision, the performance traps that catch even experienced developers off guard, and the production patterns that separate hobby projects from applications that actually hold up under load. Whether you're building a small script that needs to remember state between runs or architecting a multi-tenant SaaS product, the patterns here will serve you well.
By the end of this article, you'll know how to define models that your IDE actually understands, manage database sessions without leaking resources, write queries that compose naturally, handle relationships without triggering the dreaded N+1 problem, and migrate your schema safely in production. You'll also walk away with a clear mental model of what SQLAlchemy is doing under the hood, which matters the moment something goes wrong and you need to debug it.
We're covering SQLAlchemy 2.0 throughout. If you've seen older tutorials using query() methods and legacy patterns, set those aside. The modern approach is cleaner, more type-safe, and honestly, more fun to write. Let's get into it.
Table of Contents
- Why ORMs Over Raw SQL
- Why SQLAlchemy? The ORM Advantage
- Core vs. ORM: Know the Difference
- Setting Up: Declarative Models with DeclarativeBase
- Handling Optional Columns
- Creating the Engine: Your Database Connection
- Engine Configuration Best Practices
- Sessions: The Gateway to Data
- Understanding the Session Lifecycle
- Session Lifecycle
- CRUD Operations: The Fundamentals
- CREATE: Adding New Records
- READ: Querying Data
- UPDATE: Modifying Records
- DELETE: Removing Records
- Advanced Queries: Filtering, Ordering, Limiting
- Filtering with Multiple Conditions
- Ordering and Limiting
- Pagination
- Counting and Aggregation
- Query Optimization
- Relationships: One-to-Many and Many-to-Many
- One-to-Many: User → Posts
- Many-to-Many: Tags and Posts
- Avoiding the N+1 Problem
- Transactions and Rollback
- Alembic: Database Migrations
- Installing and Setting Up Alembic
- Configuring Alembic
- Creating a Migration
- Running Migrations
- Common SQLAlchemy Mistakes
- Practical Example: A Simple Blog
- PostgreSQL and MySQL Differences
- PostgreSQL
- MySQL
- Performance Tips
- 1. Use Connection Pooling
- 2. Eager Load Related Objects
- 3. Use Bulk Operations for Mass Changes
- 4. Index Strategic Columns
- 5. Use `echo=False` in Production
- Advanced Techniques: Hybrid Properties and Validators
- Hybrid Properties
- Column Validators
- Event Listeners: Reacting to Changes
- Testing with SQLAlchemy
- Common Pitfalls and Solutions
- Pitfall 1: Forgetting to Commit
- Pitfall 2: Accessing Detached Objects
- Pitfall 3: Circular Imports
- Pitfall 4: N+1 Queries
- Pitfall 5: Transaction Isolation
- Debugging Tools
- Enable SQL Echo
- Use the Python REPL
- Check Generated SQL
- Real-World Production Patterns
- Connection Pooling
- Async SQLAlchemy
- Dependency Injection
- Wrap-Up
Why ORMs Over Raw SQL
Before we write a single line of SQLAlchemy code, let's settle the debate that always comes up: why bother with an ORM at all when you could just write SQL?
The honest answer is that raw SQL is fine for simple, stable queries. But the moment your application grows, more tables, more relationships, more developers on the team, raw SQL starts to work against you. You end up maintaining two parallel representations of your data: the Python objects you work with in code and the SQL strings you write to interact with the database. Every schema change means hunting down every query that references the changed columns. Every database column becomes a positional index in a tuple that you have to remember to decode correctly.
ORMs solve this by making your database schema part of your Python code. Your tables become classes, your columns become typed attributes, and your queries become method calls that your IDE can autocomplete and your type checker can validate. The database becomes a persistence layer that serves your Python objects, not a separate system you have to translate to and from.
Beyond ergonomics, there's a practical portability argument. SQLAlchemy abstracts over database dialects, meaning the same ORM code works with SQLite during development, PostgreSQL in production, and MySQL if your client insists. You don't have to know that PostgreSQL uses RETURNING while MySQL doesn't, SQLAlchemy handles it. This isn't a hypothetical benefit either: it's genuinely common to prototype with SQLite locally and deploy to PostgreSQL, and having a single codebase that works with both is worth real developer hours.
The security argument is often overlooked but critical: ORMs parameterize queries by default. SQL injection, one of the most common and damaging web vulnerabilities, happens when user input gets interpolated directly into SQL strings. With an ORM, you're constructing query objects, not strings, and parameterization is automatic. You can still shoot yourself in the foot if you try hard enough, but the default path is the safe path.
None of this means ORMs are always the right tool. Complex analytical queries, heavy aggregations, and database-specific features sometimes call for dropping down to raw SQL. SQLAlchemy is built for this, you can mix ORM queries with raw SQL when needed. But for the bread-and-butter CRUD operations that make up the bulk of most applications, the ORM layer is a clear win.
Why SQLAlchemy? The ORM Advantage
With raw SQL, you're juggling strings:
# Raw SQL - you're on your own
cursor.execute("SELECT * FROM users WHERE age > ? AND status = ?", (18, 'active'))
results = cursor.fetchall()
for row in results:
user_id, name, age, status = row[0], row[1], row[2], row[3]
print(f"{name} is {age}")Notice what's happening here: you're manually unpacking tuples by index position, and there's nothing stopping you from swapping row[1] and row[2] by accident. Your IDE can't help you because it has no idea what the query returns. With SQLAlchemy's ORM, you work with Python objects:
# SQLAlchemy ORM - Python-native
users = session.query(User).filter(User.age > 18, User.status == 'active').all()
for user in users:
print(f"{user.name} is {user.age}")The benefits stack up:
- Type safety: Your IDE knows
user.nameis a string. Autocomplete works. - Database agnostic: Switch from SQLite to PostgreSQL without rewriting queries.
- Relationships: Define one-to-many or many-to-many relationships once; query them naturally.
- Migrations: Track schema changes over time with Alembic.
- Security: Built-in protection against SQL injection.
SQLAlchemy 2.0 took this further by making the modern syntax the default. The newer select() function is more flexible and composable than the old query() method. We'll use that throughout.
Core vs. ORM: Know the Difference
SQLAlchemy has two layers:
- Core: Low-level SQL expression language. Write SQL-like queries in Python but with more flexibility and type checking.
- ORM: Higher-level object mapping. Define Python classes; they map to database tables automatically.
You'll use the ORM layer most of the time. But understand that under the hood, the ORM uses Core. This matters when you need to drop down and write complex queries, you have that escape hatch.
For this article, we're ORM-focused. The ORM is where the magic happens for typical application development.
Setting Up: Declarative Models with DeclarativeBase
Let's start with the foundation: defining database models.
In SQLAlchemy 2.0, you inherit from DeclarativeBase. This single base class handles all the boilerplate. The DeclarativeBase approach is a significant improvement over older versions, it gives you full type annotation support, which means your type checker can validate your model definitions at write time, not just at runtime.
from sqlalchemy import create_engine, String, Integer, DateTime, ForeignKey
from sqlalchemy.orm import DeclarativeBase, Session, mapped_column, Mapped
from datetime import datetime
from typing import Optional
class Base(DeclarativeBase):
"""Base class for all models."""
pass
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str] = mapped_column(String(50), unique=True)
email: Mapped[str] = mapped_column(String(100), unique=True)
age: Mapped[int]
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow)
is_active: Mapped[bool] = mapped_column(default=True)
class Post(Base):
__tablename__ = "posts"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str] = mapped_column(String(200))
content: Mapped[str]
user_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
created_at: Mapped[datetime] = mapped_column(DateTime, default=datetime.utcnow)What's happening here?
Base(DeclarativeBase): Your custom base class. Every model inherits from it.__tablename__: Maps the class to a database table.User→userstable.Mapped[type]annotations: Type hints that SQLAlchemy interprets.Mapped[int]= integer column.Mapped[str]= string column.mapped_column(): Defines column properties. Parameters likeprimary_key=True,unique=True,defaultset constraints.ForeignKey("users.id"): Creates a relationship.Post.user_idreferencesUser.id.
The Mapped type hint is the 2.0 way. It tells SQLAlchemy exactly what you expect, and your type checker (mypy, pyright, etc.) will catch mistakes.
Handling Optional Columns
Not every column is required. Use Optional:
from typing import Optional
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str]
bio: Mapped[Optional[str]] = mapped_column(default=None) # Optional column
age: Mapped[Optional[int]] = mapped_column(default=None)The Optional[str] tells SQLAlchemy this column allows NULL. The default=None is the Python default when you create a User object without setting it. This distinction matters: Optional controls the database schema (nullable vs. NOT NULL), while default controls what Python assigns when you omit the attribute.
Creating the Engine: Your Database Connection
An engine is the starting point. It's the factory for database connections. Think of it like a configuration object that knows which database to talk to and how many connections to maintain, it doesn't establish a connection itself until you actually need one.
from sqlalchemy import create_engine
# SQLite in-memory (testing)
engine = create_engine("sqlite:///:memory:")
# SQLite file (development)
engine = create_engine("sqlite:///app.db")
# PostgreSQL (production)
engine = create_engine("postgresql://user:password@localhost:5432/myapp")
# MySQL
engine = create_engine("mysql+pymysql://user:password@localhost/myapp")Connection string format: dialect+driver://user:password@host:port/database
Once you have an engine, create the tables. This call inspects all the model classes you've defined and issues the corresponding CREATE TABLE statements:
Base.metadata.create_all(engine)This reads your model definitions and creates corresponding tables in the database. If a table already exists, SQLAlchemy skips it (this is why migrations with Alembic matter in production, you want versioned schema changes, not auto-creation). The create_all call is safe to run repeatedly: it's idempotent because it checks for table existence before acting.
Engine Configuration Best Practices
Real applications need more control. The defaults are fine for getting started, but production systems need explicit pool configuration and appropriate logging levels:
from sqlalchemy import create_engine, event
from sqlalchemy.pool import NullPool, QueuePool
# Production PostgreSQL
engine = create_engine(
"postgresql://user:password@localhost/myapp",
# Connection pooling
poolclass=QueuePool,
pool_size=10, # How many connections to keep ready
max_overflow=20, # How many extra connections if pool is full
# Performance
echo=False, # Set True to see SQL statements (verbose!)
future=True, # Use 2.0 style (recommended)
)
# SQLite doesn't support connection pooling well; disable it
engine = create_engine(
"sqlite:///app.db",
poolclass=NullPool, # No pooling for SQLite
)
# Log SQL statements (useful for debugging)
import logging
logging.basicConfig()
logging.getLogger('sqlalchemy.engine').setLevel(logging.INFO)The pool manages connections. In production, you don't want to create a new connection for every query, that's slow. The pool keeps connections alive and reuses them. The pool_pre_ping=True option (not shown above but highly recommended) tests each connection before use, automatically discarding stale connections that the database has closed.
Sessions: The Gateway to Data
A session is your connection to the database. It's where you execute queries and manage transactions. You can think of a session as a staging area: changes you make to objects in Python don't immediately hit the database. They accumulate in the session until you call commit(), at which point everything flushes as a single transaction.
from sqlalchemy.orm import Session
# Simple approach
with Session(engine) as session:
# Your database work here
pass
# Session closes automatically when exiting the `with` blockThe with statement is critical. It ensures the session closes, which flushes pending changes and releases database resources. Never skip this. Leaving sessions open is one of the most common production bugs, connection pools have finite size, and leaked sessions exhaust the pool until new requests start hanging.
For more control, use sessionmaker:
from sqlalchemy.orm import sessionmaker
# Create a session factory
SessionLocal = sessionmaker(engine)
# Use it
with SessionLocal() as session:
# Your work
pass
# Or without context manager (but remember to close!)
session = SessionLocal()
try:
# Your work
session.commit()
finally:
session.close()The context manager approach is cleaner and safer. Always prefer it. The sessionmaker factory is particularly useful in web frameworks where you want to configure session behavior once (timeout settings, autocommit behavior) and reuse that configuration across all requests.
Understanding the Session Lifecycle
A session goes through stages:
- Transient: Object exists in Python but isn't tracked by the session.
- Pending: Object is added to the session but not yet in the database.
- Persistent: Object is in the database and tracked by the session.
- Detached: Object was tracked but the session closed.
Let's see this in action:
with Session(engine) as session:
# Create a transient object
new_user = User(username="alice", email="alice@example.com", age=25)
print(f"State: transient (not in session yet)")
# Add to session (pending)
session.add(new_user)
print(f"State: pending (added but not committed)")
# Commit (persistent)
session.commit()
print(f"State: persistent (in database)")
print(f"User ID: {new_user.id}") # ID populated by database
# After the `with` block, session is closed
print(f"State: detached (session closed)")When you commit, the session sends all pending changes to the database in a transaction. If anything fails, the whole transaction rolls back. Notice that after the commit, the new_user object already has its id populated, SQLAlchemy refreshes the object with the database-generated values automatically.
Session Lifecycle
Understanding the session lifecycle in depth will save you from debugging puzzles that can otherwise eat hours. The four states (transient, pending, persistent, detached) aren't just conceptual labels, they determine what operations are valid and what will trigger database queries.
When an object is transient, SQLAlchemy knows nothing about it. It's just a regular Python object. No tracking, no identity map, no connection to any database row. The moment you call session.add(obj), it moves to pending: the session now tracks it, but no SQL has executed yet. This is deliberate. SQLAlchemy batches your additions so it can issue efficient SQL when the time comes.
The transition to persistent happens at flush time, either when you explicitly call session.flush(), when you call session.commit(), or automatically before queries that need the latest state. During a flush, SQLAlchemy figures out the correct order to issue INSERT/UPDATE/DELETE statements based on foreign key dependencies and issues them all within the active transaction.
Detached objects are where many bugs hide. After a session closes, any objects that were persistent become detached. They still exist in Python, and their currently loaded attributes are still accessible. But if you try to access a lazy-loaded relationship on a detached object, SQLAlchemy will throw a DetachedInstanceError because there's no session to execute the query through. The fix is always the same: either keep the session open while you need the data, or eager-load the relationships before the session closes.
The session also maintains an identity map, a dictionary that maps each database row's primary key to the Python object representing it. This means if you query for the same user twice in the same session, you get back the same Python object both times. No duplicate objects, no confusion about which copy is "real." This is one of the ORM's most useful (and often misunderstood) behaviors.
CRUD Operations: The Fundamentals
CRUD = Create, Read, Update, Delete. Let's work through each.
CREATE: Adding New Records
with Session(engine) as session:
# Create a new user
user = User(username="bob", email="bob@example.com", age=30)
# Add to session (not yet in database)
session.add(user)
# Commit to save
session.commit()
# After commit, the ID is populated
print(f"Created user with ID: {user.id}")You can add multiple at once. The add_all method accepts any iterable of model instances, and SQLAlchemy will batch the inserts efficiently rather than issuing one INSERT per object:
with Session(engine) as session:
users = [
User(username="charlie", email="charlie@example.com", age=28),
User(username="diana", email="diana@example.com", age=32),
]
session.add_all(users)
session.commit()
print(f"Created {len(users)} users")READ: Querying Data
This is where the ORM shines. Let's use the modern select() function. The select() approach in 2.0 is composable in a way that the old query() method wasn't, you can build up a statement in pieces, pass it around, and execute it when ready, which makes it much easier to build query helpers and reusable filters.
from sqlalchemy import select
# Get a single user by ID
with Session(engine) as session:
stmt = select(User).where(User.id == 1)
user = session.scalar(stmt) # scalar() returns one result or None
print(f"{user.username}: {user.email}")
# Get all users
with Session(engine) as session:
stmt = select(User)
users = session.scalars(stmt).all() # scalars() returns scalars, .all() gets all results
for user in users:
print(f"{user.username} ({user.age})")
# Get with filtering
with Session(engine) as session:
stmt = select(User).where(User.age > 25)
older_users = session.scalars(stmt).all()
print(f"Found {len(older_users)} users over 25")
# Multiple conditions
with Session(engine) as session:
stmt = select(User).where(
(User.age > 20) & (User.is_active == True)
)
active_users = session.scalars(stmt).all()Key methods:
scalar(): Execute and return exactly one result (or None).scalars(): Execute and return a result set. Call.all()to get all,.first()to get the first, or iterate directly.execute(): Lower-level. Returns rows; mostly for complex queries.
UPDATE: Modifying Records
with Session(engine) as session:
# Fetch the user
stmt = select(User).where(User.username == "bob")
user = session.scalar(stmt)
# Modify
user.age = 31
user.is_active = True
# Commit
session.commit()
print(f"Updated {user.username}")SQLAlchemy tracks changes automatically. When you modify an attribute on a persistent object, the session marks it as dirty. On commit, it executes an UPDATE statement, and it's smart enough to only include the columns that actually changed, not every column on the row.
Bulk updates (without fetching all records):
from sqlalchemy import update
with Session(engine) as session:
stmt = update(User).where(User.age < 30).values(is_active=False)
result = session.execute(stmt)
session.commit()
print(f"Deactivated {result.rowcount} users under 30")This is more efficient, it's a single UPDATE statement, not a loop. When you need to update thousands of rows, the fetch-modify-commit pattern would issue one UPDATE per row. The bulk approach issues one UPDATE total. Use bulk operations whenever you're changing records at scale.
DELETE: Removing Records
with Session(engine) as session:
# Fetch and delete
stmt = select(User).where(User.username == "alice")
user = session.scalar(stmt)
session.delete(user)
session.commit()
print(f"Deleted {user.username}")Bulk delete (without fetching):
from sqlalchemy import delete
with Session(engine) as session:
stmt = delete(User).where(User.age > 60)
result = session.execute(stmt)
session.commit()
print(f"Deleted {result.rowcount} users over 60")The same efficiency argument applies as with bulk updates. For mass deletions, skip the fetch entirely and go straight to a bulk DELETE statement.
Advanced Queries: Filtering, Ordering, Limiting
Real queries are rarely simple. Let's build more complex ones.
Filtering with Multiple Conditions
from sqlalchemy import and_, or_, not_
with Session(engine) as session:
# AND: age > 25 AND is_active = True
stmt = select(User).where(
(User.age > 25) & (User.is_active == True)
)
# OR: username starts with 'a' OR age < 20
stmt = select(User).where(
(User.username.like('a%')) | (User.age < 20)
)
# NOT: not active
stmt = select(User).where(not_(User.is_active))
results = session.scalars(stmt).all()Use & (AND), | (OR), not_() for logic. Python's and, or, not don't work with SQLAlchemy expressions, they operate on the truthiness of the expression objects themselves, not the SQL logic you intend. This is a common stumbling block, so keep the symbolic operators in mind.
Ordering and Limiting
from sqlalchemy import desc, asc
with Session(engine) as session:
# Order by age ascending, limit to 5
stmt = select(User).order_by(User.age).limit(5)
youngest = session.scalars(stmt).all()
# Order by age descending (oldest first)
stmt = select(User).order_by(desc(User.age)).limit(5)
oldest = session.scalars(stmt).all()
# Multiple order columns
stmt = select(User).order_by(
User.is_active.desc(),
User.created_at.desc()
)
results = session.scalars(stmt).all()Pagination
Pagination is crucial for large datasets. Use offset() and limit(). Without pagination, a query against a table with a million rows returns a million Python objects, which will exhaust memory and take forever. Always paginate when the dataset size is unbounded:
def get_users_paginated(page: int, page_size: int = 10):
"""Fetch a page of users."""
with Session(engine) as session:
stmt = (
select(User)
.order_by(User.id)
.offset((page - 1) * page_size)
.limit(page_size)
)
return session.scalars(stmt).all()
# Get page 1, 10 per page
users_page_1 = get_users_paginated(1, 10)
users_page_2 = get_users_paginated(2, 10)Counting and Aggregation
from sqlalchemy import func, select
with Session(engine) as session:
# Count all users
stmt = select(func.count(User.id))
total = session.scalar(stmt)
print(f"Total users: {total}")
# Count active users
stmt = select(func.count(User.id)).where(User.is_active == True)
active_count = session.scalar(stmt)
# Average age
stmt = select(func.avg(User.age))
avg_age = session.scalar(stmt)
# Min/max
stmt = select(func.min(User.age), func.max(User.age))
min_age, max_age = session.execute(stmt).one()The func object gives you SQL aggregation functions. It's a proxy that maps attribute access to SQL function calls, func.count(), func.avg(), func.max(), func.coalesce(), and anything else your database supports all work the same way.
Query Optimization
Writing correct queries is only half the battle. Writing fast queries is where the real skill lives, and SQLAlchemy gives you excellent tools to understand and improve your query performance.
The first thing to do when a query feels slow is see exactly what SQL it's generating. Set echo=True on your engine during development, or print the statement object directly with print(stmt). SQLAlchemy will show you the compiled SQL. If you see a query that's selecting every column when you only need two, use select(User.username, User.email) instead of select(User) to retrieve only what you need.
Indexes are your most powerful tool. Any column you filter or sort on frequently should be indexed. In SQLAlchemy, you declare this at the model level:
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str] = mapped_column(unique=True, index=True)
email: Mapped[str] = mapped_column(unique=True, index=True)
created_at: Mapped[datetime] = mapped_column(index=True)For composite indexes spanning multiple columns, like when you frequently filter by (user_id, created_at) together, use __table_args__:
from sqlalchemy import Index
class Post(Base):
__tablename__ = "posts"
__table_args__ = (
Index('ix_posts_user_created', 'user_id', 'created_at'),
)
id: Mapped[int] = mapped_column(primary_key=True)
user_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
created_at: Mapped[datetime] = mapped_column(default=datetime.utcnow)
title: Mapped[str]For analytical queries, counting, averaging, grouping, avoid loading full ORM objects. Fetching a thousand User objects just to count them wastes memory and time. Use func.count() with a targeted select() that returns a scalar, not a list of objects. Reserve full object loading for when you actually need to work with the object's attributes or relationships.
Finally, the EXPLAIN command (or EXPLAIN ANALYZE in PostgreSQL) shows you the database's execution plan for a query. You can run it through SQLAlchemy by executing raw SQL for the explain output. When a query is unexpectedly slow despite indexing, the execution plan tells you exactly what the database is doing, whether it's doing a full table scan, using the wrong index, or joining in an inefficient order.
Relationships: One-to-Many and Many-to-Many
Relationships are where the ORM really pays off. Defining them correctly once in your model means you can navigate between related objects naturally in code, without manually joining tables in every query.
One-to-Many: User → Posts
A user has many posts. A post belongs to one user. The relationship() call tells SQLAlchemy how to connect the two models:
from sqlalchemy.orm import relationship
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str]
email: Mapped[str]
# Relationship: User → Posts
posts: Mapped[list["Post"]] = relationship(back_populates="author")
class Post(Base):
__tablename__ = "posts"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str]
content: Mapped[str]
user_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
# Relationship: Post → User
author: Mapped[User] = relationship(back_populates="posts")relationship(back_populates=...) defines a bidirectional relationship. back_populates="posts" in User matches back_populates="author" in Post. The string references ("Post", "User") defer evaluation until all classes are defined, which avoids circular import issues.
Now you can navigate relationships naturally:
with Session(engine) as session:
# Fetch a user with their posts
stmt = select(User).where(User.username == "bob")
user = session.scalar(stmt)
# Access posts (lazy-loaded by default)
for post in user.posts:
print(f"- {post.title}")
# Add a post to a user
new_post = Post(title="My New Post", content="...", author=user)
session.add(new_post)
session.commit()When you access user.posts, SQLAlchemy fires a database query behind the scenes. This is called lazy loading. It's convenient but can cause performance issues if you're not careful (the "N+1 query problem").
Eager loading prevents this:
from sqlalchemy.orm import selectinload
with Session(engine) as session:
# Load user and posts in one query
stmt = select(User).options(selectinload(User.posts))
user = session.scalar(stmt)
# This doesn't fire a new query
for post in user.posts:
print(f"- {post.title}")Many-to-Many: Tags and Posts
Posts can have multiple tags, and tags can apply to multiple posts. The join table that SQLAlchemy needs to manage this relationship is defined as a plain Table object, not a model class, because you don't need direct access to the rows in that table:
from sqlalchemy import Table
# Association table (junction table)
post_tags = Table(
'post_tags',
Base.metadata,
Column(Integer, ForeignKey('posts.id'), primary_key=True),
Column(Integer, ForeignKey('tags.id'), primary_key=True),
)
class Post(Base):
__tablename__ = "posts"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str]
# Many-to-many relationship
tags: Mapped[list["Tag"]] = relationship(
secondary=post_tags,
back_populates="posts"
)
class Tag(Base):
__tablename__ = "tags"
id: Mapped[int] = mapped_column(primary_key=True)
name: Mapped[str] = mapped_column(unique=True)
# Reverse relationship
posts: Mapped[list["Post"]] = relationship(
secondary=post_tags,
back_populates="tags"
)The post_tags table is a join table linking posts and tags. SQLAlchemy handles it automatically. When you add a tag to a post's tags list and commit, SQLAlchemy inserts the appropriate row into post_tags without you having to think about it.
with Session(engine) as session:
# Get or create tags
python_tag = Tag(name="python")
web_tag = Tag(name="web")
# Create post with tags
post = Post(
title="Python Web Development",
content="...",
tags=[python_tag, web_tag]
)
session.add(post)
session.commit()
# Query: find all posts tagged 'python'
stmt = select(Post).where(Post.tags.any(Tag.name == "python"))
python_posts = session.scalars(stmt).all()Use .any() to check if any related object matches a condition. The .any() and .has() methods translate to SQL EXISTS subqueries, which are often more efficient than joins for existence checks.
Avoiding the N+1 Problem
The N+1 problem happens when lazy loading causes multiple queries.
# Bad: N+1 queries
with Session(engine) as session:
users = session.scalars(select(User)).all()
for user in users:
print(f"{user.username}: {len(user.posts)} posts")
# Each user triggers a new query to fetch posts!
# N users + 1 initial query = N+1 total queriesIf you have 100 users, this code issues 101 database queries. With 1000 users, it's 1001. The database round-trip cost multiplies with your data size. This pattern is insidious because it works correctly in development with small datasets and only becomes a problem when you hit production scale.
Solutions:
# Option 1: Eager load with selectinload
from sqlalchemy.orm import selectinload
with Session(engine) as session:
stmt = select(User).options(selectinload(User.posts))
users = session.scalars(stmt).all()
for user in users:
print(f"{user.username}: {len(user.posts)} posts")
# posts already loaded; no new queries
# Option 2: Join load
from sqlalchemy.orm import joinedload
with Session(engine) as session:
stmt = select(User).options(joinedload(User.posts))
users = session.scalars(stmt).all()
for user in users:
print(f"{user.username}: {len(user.posts)} posts")
# Option 3: Explicit join for custom queries
with Session(engine) as session:
stmt = select(User, Post).join(Post)
results = session.execute(stmt).all()
for user, post in results:
print(f"{user.username}: {post.title}")selectinload is usually the safest bet. It issues a separate but optimized query, one query for users, one query for all their posts in bulk. joinedload uses SQL joins, which can be more efficient but trickier with multiple relationships because joins can produce unexpected row multiplication.
Transactions and Rollback
Transactions ensure data consistency. If something goes wrong mid-operation, everything rolls back. This is the database's guarantee to you: either all of your changes commit together, or none of them do. Never half-committed data.
with Session(engine) as session:
try:
user = User(username="eve", email="eve@example.com", age=27)
session.add(user)
# Simulate an error
if user.age < 18:
raise ValueError("User must be 18+")
session.commit()
except Exception as e:
session.rollback() # Undo all changes
print(f"Error: {e}")The with statement handles rollback on exceptions. But explicit session.rollback() is sometimes clearer when you want to recover from an error and continue using the same session rather than abandoning it.
Savepoints let you partially roll back:
with Session(engine) as session:
user1 = User(username="frank", email="frank@example.com", age=35)
session.add(user1)
session.flush() # Execute the INSERT, but don't commit yet
# Create a savepoint
savepoint = session.begin_nested()
try:
user2 = User(username="grace", email="grace@example.com", age=-5) # Invalid
session.add(user2)
session.flush()
except Exception:
savepoint.rollback() # Roll back only user2
session.commit() # Commits user1 (and user2 if no error)Savepoints are advanced and rarely needed, but they exist for complex scenarios.
Alembic: Database Migrations
In production, you can't just drop and recreate tables. You need migrations, version-controlled schema changes.
Installing and Setting Up Alembic
pip install alembic
alembic init alembicThis creates an alembic/ directory with a config file. The alembic/ directory contains a versions/ subdirectory where each migration gets its own file, an env.py that configures how Alembic connects to your database, and a script.py.mako template for generating new migration files.
Configuring Alembic
Edit alembic/env.py to point to your models:
from myapp.models import Base # Your Base class
from sqlalchemy import engine_from_config
target_metadata = Base.metadataEdit alembic.ini to set your database URL:
sqlalchemy.url = postgresql://user:password@localhost/myappCreating a Migration
Alembic can auto-generate migrations from your model changes. After you add a new column or table to your models, run:
alembic revision --autogenerate -m "Add bio column to users"This creates a migration file in alembic/versions/. Check it:
# alembic/versions/xxxx_add_bio_column.py
def upgrade():
op.add_column('users', sa.Column('bio', sa.String(500), nullable=True))
def downgrade():
op.drop_column('users', 'bio')Running Migrations
# Apply all pending migrations
alembic upgrade head
# Undo the last migration
alembic downgrade -1
# Check current revision
alembic currentIn production, run alembic upgrade head during deployment. This safely evolves your schema. The migration history is stored in a alembic_version table in your database, so Alembic always knows exactly which migrations have been applied and which are pending.
Common SQLAlchemy Mistakes
Even experienced developers run into the same set of SQLAlchemy pitfalls repeatedly. Here's what to watch for and how to handle it.
The most common mistake is forgetting to commit. You add objects to the session, everything looks fine in Python, and then you discover nothing actually persisted because you never called commit(). Use the context manager pattern consistently, and note that the Session context manager does not auto-commit; it only auto-closes. You still need to call commit() explicitly inside the block.
The second most common mistake is accessing detached objects. You fetch an object inside a with Session(...) block, the block closes, and then you try to access a lazy-loaded relationship on the now-detached object. You get a DetachedInstanceError. The fix is either to access what you need before the session closes, or to eager-load the relationship with selectinload() or joinedload() before the session closes.
A subtler mistake is using Python's and/or instead of &/|. Writing where(User.age > 25 and User.is_active == True) looks reasonable but is wrong, Python's and evaluates the first expression and, if it's truthy (which a SQLAlchemy column expression always is), returns the second. You end up with a query that only applies the second condition. Use & and | for SQLAlchemy filter logic.
Ignoring the N+1 problem until it hits production is another pattern we see constantly. Lazy loading is convenient during development, but it will destroy performance at scale. Whenever you're iterating over a collection and accessing relationships inside the loop, ask yourself whether you should be using selectinload. The echo=True setting on your engine is invaluable here, if you see repeated identical queries with different IDs, you've found an N+1.
Finally, overusing create_all() in production is a maintenance trap. It feels convenient, just let SQLAlchemy create the tables, but it means you have no record of schema changes over time, no way to roll back a bad change, and no audit trail of what your schema looked like at any point. Use Alembic from day one on any project you care about.
Practical Example: A Simple Blog
Let's tie it all together with a real example.
from sqlalchemy import create_engine, select, func
from sqlalchemy.orm import DeclarativeBase, Session, mapped_column, Mapped, relationship
from datetime import datetime
from typing import Optional, List
class Base(DeclarativeBase):
pass
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str] = mapped_column(unique=True)
email: Mapped[str] = mapped_column(unique=True)
password_hash: Mapped[str]
created_at: Mapped[datetime] = mapped_column(default=datetime.utcnow)
posts: Mapped[List["Post"]] = relationship(back_populates="author")
comments: Mapped[List["Comment"]] = relationship(back_populates="author")
class Post(Base):
__tablename__ = "posts"
id: Mapped[int] = mapped_column(primary_key=True)
title: Mapped[str]
content: Mapped[str]
published: Mapped[bool] = mapped_column(default=False)
created_at: Mapped[datetime] = mapped_column(default=datetime.utcnow)
updated_at: Mapped[datetime] = mapped_column(default=datetime.utcnow, onupdate=datetime.utcnow)
user_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
author: Mapped[User] = relationship(back_populates="posts")
comments: Mapped[List["Comment"]] = relationship(back_populates="post")
class Comment(Base):
__tablename__ = "comments"
id: Mapped[int] = mapped_column(primary_key=True)
content: Mapped[str]
created_at: Mapped[datetime] = mapped_column(default=datetime.utcnow)
user_id: Mapped[int] = mapped_column(ForeignKey("users.id"))
post_id: Mapped[int] = mapped_column(ForeignKey("posts.id"))
author: Mapped[User] = relationship(back_populates="comments")
post: Mapped[Post] = relationship(back_populates="comments")
# Setup
engine = create_engine("sqlite:///blog.db")
Base.metadata.create_all(engine)
# Example operations
def create_sample_data():
with Session(engine) as session:
# Create users
user1 = User(username="alice", email="alice@example.com", password_hash="...")
user2 = User(username="bob", email="bob@example.com", password_hash="...")
session.add_all([user1, user2])
session.flush()
# Create posts
post1 = Post(title="SQLAlchemy Tips", content="...", author=user1, published=True)
post2 = Post(title="Python Best Practices", content="...", author=user1)
session.add_all([post1, post2])
session.flush()
# Create comments
comment1 = Comment(content="Great post!", author=user2, post=post1)
session.add(comment1)
session.commit()
def get_published_posts():
with Session(engine) as session:
stmt = select(Post).where(Post.published == True).order_by(Post.created_at.desc())
return session.scalars(stmt).all()
def get_post_with_comments(post_id: int):
with Session(engine) as session:
from sqlalchemy.orm import selectinload
stmt = select(Post).where(Post.id == post_id).options(selectinload(Post.comments))
return session.scalar(stmt)
def count_user_posts(user_id: int):
with Session(engine) as session:
stmt = select(func.count(Post.id)).where(Post.user_id == user_id)
return session.scalar(stmt)
# Usage
create_sample_data()
posts = get_published_posts()
print(f"Published posts: {len(posts)}")
post = get_post_with_comments(1)
if post:
print(f"{post.title} by {post.author.username}")
for comment in post.comments:
print(f" - {comment.author.username}: {comment.content}")
print(f"Alice has written {count_user_posts(1)} posts")This blog example shows relationships, eager loading, aggregation, and real-world patterns. Notice how get_post_with_comments uses selectinload to prevent the N+1 problem, and count_user_posts uses func.count() to aggregate at the database level rather than fetching all posts into Python.
PostgreSQL and MySQL Differences
SQLAlchemy abstracts database differences, but a few things differ:
PostgreSQL
# PostgreSQL is powerful; use it in production
engine = create_engine(
"postgresql+psycopg2://user:password@localhost/myapp",
echo=False,
)
# Supports RETURNING for efficient updates
from sqlalchemy import update
with Session(engine) as session:
stmt = update(User).where(User.id == 1).values(age=31).returning(User)
user = session.scalar(stmt) # Efficient!MySQL
# Use pymysql driver (or mysqlconnector)
engine = create_engine("mysql+pymysql://user:password@localhost/myapp")
# MySQL doesn't support RETURNING; stick to standard patternsBoth support the same ORM patterns. PostgreSQL is preferred for serious applications, it's more standards-compliant, has better support for advanced features like full-text search and JSON columns, and generally has better debugging tools. If you're starting a new project today and you have a choice, choose PostgreSQL.
Performance Tips
1. Use Connection Pooling
engine = create_engine(
"postgresql://...",
pool_size=10,
max_overflow=20,
pool_pre_ping=True, # Test connections before use
)2. Eager Load Related Objects
from sqlalchemy.orm import selectinload
stmt = select(User).options(selectinload(User.posts), selectinload(User.comments))3. Use Bulk Operations for Mass Changes
from sqlalchemy import update, delete
# Bulk update (single query)
session.execute(update(User).where(...).values(...))
# Bulk delete (single query)
session.execute(delete(Post).where(...))4. Index Strategic Columns
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str] = mapped_column(unique=True, index=True)
email: Mapped[str] = mapped_column(unique=True, index=True)5. Use echo=False in Production
SQL query logging is useful for debugging but expensive in production. The logging overhead adds up under load, and the volume of output can overwhelm your logging infrastructure. Keep echo=False (the default) in production and enable it surgically when investigating specific issues.
Advanced Techniques: Hybrid Properties and Validators
Beyond basic CRUD, SQLAlchemy offers tools for derived data and validation.
Hybrid Properties
A hybrid property exists in both Python and SQL. It lets you define computed columns that work both in memory and in queries. This is particularly useful for properties that you want to both access on Python objects and filter by in queries:
from sqlalchemy.ext.hybrid import hybrid_property
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str]
first_name: Mapped[str]
last_name: Mapped[str]
@hybrid_property
def full_name(self) -> str:
"""Computed property: works in Python."""
return f"{self.first_name} {self.last_name}"
@full_name.expression
@classmethod
def full_name(cls):
"""SQL version: works in queries."""
from sqlalchemy import func, concat
return func.concat(cls.first_name, ' ', cls.last_name)Now you can use full_name both ways:
with Session(engine) as session:
user = session.scalar(select(User).where(User.id == 1))
print(user.full_name) # Python: concatenates attributes
# SQL query
stmt = select(User).where(User.full_name == "Alice Smith")
results = session.scalars(stmt).all()The @full_name.expression decorator tells SQLAlchemy how to translate it to SQL. Without the expression, the hybrid property only works in Python, accessing it on instances works, but using it in a where() clause would fail.
Column Validators
Validate data at the model layer before it hits the database:
from sqlalchemy.orm import validates
class User(Base):
__tablename__ = "users"
id: Mapped[int] = mapped_column(primary_key=True)
username: Mapped[str]
email: Mapped[str]
age: Mapped[int]
@validates('email')
def validate_email(self, key, value):
if '@' not in value:
raise ValueError("Invalid email address")
return value.lower()
@validates('age')
def validate_age(self, key, value):
if value < 0 or value > 150:
raise ValueError("Age must be between 0 and 150")
return valueValidators run whenever you set an attribute, before commit, catching errors early:
with Session(engine) as session:
try:
user = User(username="alice", email="invalid", age=25)
session.add(user)
session.commit() # Fails here: email validation
except ValueError as e:
print(f"Validation error: {e}")Event Listeners: Reacting to Changes
SQLAlchemy fires events throughout the object lifecycle. You can hook into them. This is the mechanism behind audit logging, automatic timestamp updates, and cache invalidation, any time you want to run code in response to a database operation, event listeners are the tool:
from sqlalchemy import event
from datetime import datetime
@event.listens_for(Post, 'before_insert')
def receive_before_insert(mapper, connection, target):
"""Fires before a new Post is inserted."""
target.created_at = datetime.utcnow()
@event.listens_for(Post, 'before_update')
def receive_before_update(mapper, connection, target):
"""Fires before a Post is updated."""
target.updated_at = datetime.utcnow()
@event.listens_for(User, 'after_insert')
def receive_after_insert(mapper, connection, target):
"""Fires after a new User is inserted."""
print(f"New user created: {target.username}")Event listeners are powerful for:
- Logging: Track who changed what.
- Audit trails: Store change history.
- Cascading updates: Update related records.
- Side effects: Send emails, invalidate caches, etc.
# Audit trail example
class AuditLog(Base):
__tablename__ = "audit_logs"
id: Mapped[int] = mapped_column(primary_key=True)
entity_type: Mapped[str]
entity_id: Mapped[int]
action: Mapped[str] # 'create', 'update', 'delete'
timestamp: Mapped[datetime] = mapped_column(default=datetime.utcnow)
@event.listens_for(User, 'after_insert')
@event.listens_for(User, 'after_update')
@event.listens_for(User, 'after_delete')
def audit_user_changes(mapper, connection, target):
"""Log all user changes."""
action = 'create' if target.id is None else 'update'
log = AuditLog(entity_type='User', entity_id=target.id, action=action)
connection.execute(...) # Insert audit logTesting with SQLAlchemy
Testing database code is tricky. SQLAlchemy makes it simpler with in-memory databases. The key insight is that an in-memory SQLite database is functionally equivalent to a file-based database for testing purposes, you get the same SQL semantics, the same ORM behavior, and the same query results. The only difference is that it's created fresh for each test run and never writes to disk:
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import Session
@pytest.fixture
def test_engine():
"""Create an in-memory SQLite database for testing."""
engine = create_engine("sqlite:///:memory:")
Base.metadata.create_all(engine)
return engine
@pytest.fixture
def test_session(test_engine):
"""Provide a session for a single test."""
with Session(test_engine) as session:
yield session
def test_create_user(test_session):
"""Test user creation."""
user = User(username="alice", email="alice@example.com", age=25)
test_session.add(user)
test_session.commit()
# Verify
assert user.id is not None
fetched = test_session.scalar(select(User).where(User.username == "alice"))
assert fetched.email == "alice@example.com"
def test_user_validation(test_session):
"""Test validators."""
with pytest.raises(ValueError):
User(username="bob", email="invalid-email", age=25)
def test_relationship_cascade(test_session):
"""Test one-to-many relationship."""
user = User(username="charlie", email="charlie@example.com", age=30)
post1 = Post(title="Post 1", content="...", author=user)
post2 = Post(title="Post 2", content="...", author=user)
test_session.add(user)
test_session.commit()
fetched_user = test_session.scalar(select(User).where(User.username == "charlie"))
assert len(fetched_user.posts) == 2Run tests with pytest:
pytest tests/test_models.py -vIn-memory databases are fast, perfect for CI pipelines. For integration tests with real databases, use Docker to spin up temporary PostgreSQL or MySQL instances.
Common Pitfalls and Solutions
Pitfall 1: Forgetting to Commit
# Wrong: changes are lost
session.add(user)
session.close() # No commit!
# Correct
session.add(user)
session.commit()
session.close()
# Best: use context manager
with Session(engine) as session:
session.add(user)
session.commit() # Automatic in context managerPitfall 2: Accessing Detached Objects
# Wrong: accessing object after session closes
with Session(engine) as session:
user = session.scalar(select(User).where(User.id == 1))
# Session closed; user is detached
print(user.posts) # Error: DetachedInstanceError
# Correct: eager load before session closes
with Session(engine) as session:
from sqlalchemy.orm import selectinload
stmt = select(User).where(User.id == 1).options(selectinload(User.posts))
user = session.scalar(stmt)
print(user.posts) # Works; posts already loadedPitfall 3: Circular Imports
When models reference each other, use string annotations:
# models.py
class User(Base):
__tablename__ = "users"
posts: Mapped[list["Post"]] = relationship(back_populates="author")
class Post(Base):
__tablename__ = "posts"
author: Mapped["User"] = relationship(back_populates="posts")The quotes defer evaluation until all classes are defined.
Pitfall 4: N+1 Queries
We covered this earlier, always use selectinload() or joinedload() for relationships.
Pitfall 5: Transaction Isolation
Different databases have different isolation levels. PostgreSQL's default is safer than MySQL's. Understand your database:
# PostgreSQL: SERIALIZABLE isolation catches conflicts
engine = create_engine(
"postgresql://...",
isolation_level="SERIALIZABLE"
)
# Explicit transaction control
with engine.begin() as conn:
# Within an explicit transaction
passDebugging Tools
Enable SQL Echo
engine = create_engine("postgresql://...", echo=True)This prints every SQL statement. Verbose but invaluable for understanding what the ORM is doing. When you think a bug is in your Python code but it's actually in the generated SQL, this is the first thing to enable.
Use the Python REPL
from sqlalchemy import create_engine, select
from myapp.models import Base, User
engine = create_engine("sqlite:///app.db")
from sqlalchemy.orm import Session
with Session(engine) as session:
# Interactively explore
stmt = select(User).limit(1)
user = session.scalar(stmt)
print(user.__dict__) # See all attributesCheck Generated SQL
from sqlalchemy import select
stmt = select(User).where(User.age > 25)
print(stmt) # Prints the SQL stringThis is surprisingly powerful for debugging complex queries. Before executing against a real database, print the statement and verify it says what you think it says. The compiled output is plain SQL that you can copy into a database client and run directly.
Real-World Production Patterns
Connection Pooling
from sqlalchemy import create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine(
"postgresql://user:pass@localhost/myapp",
poolclass=QueuePool,
pool_size=10,
max_overflow=20,
pool_recycle=3600, # Recycle connections after 1 hour
pool_pre_ping=True, # Test connections before use
)This is production-ready. The pool handles connection lifecycle, preventing stale connections. The pool_recycle setting matters especially for databases that terminate idle connections, without it, you'd hit OperationalError: server closed the connection unexpectedly at the worst possible moments.
Async SQLAlchemy
For async applications, SQLAlchemy 2.0 has native async support. If you're building a FastAPI service or any async Python application that needs high concurrency, this is the path:
from sqlalchemy.ext.asyncio import create_async_engine, AsyncSession
engine = create_async_engine("postgresql+asyncpg://user:pass@localhost/myapp")
async def fetch_user(user_id: int):
async with AsyncSession(engine) as session:
stmt = select(User).where(User.id == user_id)
return await session.scalar(stmt)
# Use in async context
import asyncio
user = asyncio.run(fetch_user(1))Async SQLAlchemy is powerful for high-concurrency applications (web servers, scrapers). But it adds complexity, use it only if you need it. If you're building a traditional synchronous application or a script, synchronous SQLAlchemy is simpler and perfectly fast enough.
Dependency Injection
In larger applications, inject the session. This pattern decouples your business logic from the session creation mechanism, making both easier to test and maintain:
# services.py
class UserService:
def __init__(self, session: Session):
self.session = session
def get_user(self, user_id: int) -> User:
return self.session.scalar(select(User).where(User.id == user_id))
def create_user(self, username: str, email: str) -> User:
user = User(username=username, email=email)
self.session.add(user)
self.session.commit()
return user
# app.py
from fastapi import FastAPI, Depends
app = FastAPI()
def get_session():
with Session(engine) as session:
yield session
@app.post("/users/")
def create_user_endpoint(username: str, session: Session = Depends(get_session)):
return UserService(session).create_user(username, f"{username}@example.com")This decouples your business logic from database details. In tests, you can pass a Session backed by an in-memory database, and your service code doesn't know or care, it just sees the same interface.
Wrap-Up
SQLAlchemy rewards the investment you put into learning it. The concepts we've covered, declarative models, session lifecycle, the select() query API, relationship loading strategies, bulk operations, validators, event listeners, and production patterns, aren't just SQLAlchemy details. They're a mental model for thinking about how Python applications interact with relational databases, and that model transfers to whatever ORM you work with next.
The patterns that matter most in practice are the ones around sessions and loading strategies. Get comfortable with selectinload as your default choice when accessing relationships, use the context manager for every session, and reach for bulk operations whenever you're working with more than a handful of rows at once. Those three habits alone will keep you out of the most common performance and correctness traps.
Alembic deserves a special mention: start using it from day one on any project you care about. The habit of writing migrations as schema changes happen, rather than letting create_all() manage everything, is the difference between a codebase that evolves gracefully and one that becomes increasingly fragile to change.
SQLAlchemy's documentation is genuinely excellent and worth reading beyond this introduction. The library has depth, JSON column support, full-text search integration, horizontal sharding, custom types, that we haven't touched here. But none of that matters until the fundamentals are solid, and those you now have.
Next up: HTTP requests and APIs. You'll learn how to consume REST endpoints, the other half of modern application development.