xnow - SQLAlchemy中的联表查询

heavy-cloud-in-bluesky

使用关系型数据库时，联表查询是一项常见的数据库操作。外键可以简化联表查询操作，确保数据的强约束和完整性。但对数据库管理而言，外键使得数据库的设计和维护变得复杂，降低灵活性。因此，许多公司禁止在业务中使用外键，而是在业务代码层保持数据的一致性。

Python中最流行的 SQLAlchemy 这个ORM库就能通过外键和自带的 relationship 函数实现跨表数据模型间的关系绑定。提供了基于外键，和基于join的2种方法。本文将从简到繁的一一介绍。

ORM 框架用于在关系型数据库和面向对象编程语言之间进行数据转换，允许开发者使用面向对象的方式来操作数据库，而不需要直接编写SQL语句，大大简化数据操作。

本来以常见的文章（post）数据库为例，比如数据库中有文章表（post）、文章分类表（category）和标签表（tag）。一般三者分属于3张表，但我们希望通过如下的方法直接访问文章的所属的分类和标签信息，就像文章表带了标签和分类字段一样。

期望的使用方法：

post.category   # 获取文章所属的分类
post.tags      # 获取文章的标签列表

本文将展示三种不同的方法实现以上需求：

使用基于类property的笨办法，底层用ORM的查询语句
使用外键，会增加数据库管理成本
使用join，既方便又直观

除了上面的单向查询，本文还有双向查询，即通过分类或者标签表的字段，反查与之相关联的文章列表。

以下代码的测试环境如下：

Python：3.12.7
SQLAlchemy：2.0.40

笨办法：基于property装饰器

为了避免使用外键，在很长一段时间里都在写下面这类代码实现联表查询。给数据模型定义一些property的方法，用于查询文章所属的分类。

class Category(Base):
    id = Column(Integer, primary_key=True)
    name = Column(String(50), nullable=False)

class Post(Base):
    id = Column(Integer, primary_key=True)
    title = Column(String(100), nullable=False)
    content = Column(String(1000))

    category_id = Column(Integer，nullable=False)

    @property
    def category(self):
        return Category.query.join(Category.id == self.category_id).all()

Post表和Category表通过Post表中的category_id字段关联起来。@property 装饰器使得可以用 Post.category查询文章的所属分类。

为此，只要引用其它表字段，都需要定义对应的property函数。接下来，再看看SQLAlchemy提供了什么有趣的方法。

基于外键的一对多双向查询方案

先从最简单的外键方案开始。设计一个简单场景，有文章表(posts)和分类表(categories)，分类和文章属于一对多关系：一篇文章仅属于一个分类，一个分类下有多篇文章。

业务需求是双向查询：

在文章模型中，可以通过category属性获得文章所属的分类
在分类模型中，可以通过 posts 属性获取分类下的所有文章

定义Category和Post两个模型类，将Post的category_id 列设置为外键。定义如下：

from sqlalchemy import ForeignKey, String
from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column, relationship

class Base(DeclarativeBase):
    pass

class Category(Base):
    __tablename__ = "categories"

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(50), nullable=False)

    posts: Mapped[list["Post"]] = relationship(back_populates="category")

    def __repr__(self) -> str:
        return f"<Category(id={self.id}, name='{self.name}')>"

class Post(Base):
    __tablename__ = "posts"

    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(100), nullable=False)
    content: Mapped[str] = mapped_column(String(1000))

    category_id: Mapped[int] = mapped_column(ForeignKey("categories.id"))
    category: Mapped[Category] = relationship(Category, back_populates="posts")

Category 中定义了 posts 这个属性用于获取分类下的所有文章列表。posts 成员变量是用 relationship 创建的到 Post类的引用。由于 Post 对象此时未定义，所以写成字符串格式 "Post" 。
Post 中定义了 category_id 这个外键，关联到 categories.id 字段
Post 中也定义了category属性用于获取文章的分类，并使用 back_populates 做反向引用。

注意：relationship 函数反向引用的字段值 category 和 posts 是本对象（Category和Post）在目标对象（Post和Category）中属性名（category和posts）。

使用上面定义的模型，创建数据库、写入测试数据进行测试。

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker

engine = create_engine("sqlite:///blog.db")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()

# 添加2个分类
tech = Category(name="Technology")
life = Category(name="Life")
session.add_all([tech, life])
session.commit()

# 创建3篇文章
post1 = Post(title="Python ORM", content="SQLAlchemy tutorial", category_id=tech.id)
post2 = Post(title="Daily Life", content="My weekend", category_id=life.id)
post3 = Post(
    title="Advanced Python", content="Decorators explained", category_id=tech.id
)
session.add_all([post1, post2, post3])
session.commit()

# 通过文章获取分类
print(post1.category)
print("-" * 10)

# 通过分类获取所有文章
for post in tech.posts:
    print(post.title)

执行结果，符合预期

$ pip install sqlalchemy
$ python main1.py
<Category(id=1, name='Technology')>
----------
Python ORM
Advanced Python

查看sqlite中的表结构

$ sqlite3  blog.db
sqlite> .schema
CREATE TABLE categories (
    id INTEGER NOT NULL, 
    name VARCHAR(50) NOT NULL, 
    PRIMARY KEY (id)
);
CREATE TABLE posts (
    id INTEGER NOT NULL, 
    title VARCHAR(100) NOT NULL, 
    content VARCHAR(1000), 
    category_id INTEGER , 
    PRIMARY KEY (id), 
    FOREIGN KEY(category_id) REFERENCES categories (id)
);

看到 sqlite的posts 表中，category_id被定义成了外键，指向 categories 表的id字段。

上面整个模型的定义是比较简洁的，SQLAlchemy会利用外键，自动将两张表、两个数据模型关联起来，没有许多额外的设置。

基于ORM的双向关联查询

如果禁止了外键，开发者依旧希望使用简单方法获取数据实体对象间的关联信息。这就是基于join的方案。既然没有外键，就必须人为指定模型之间的关联方法，声明使用哪两列进行join操作。修改模型的定义如下：

...
class Category(Base):
    __tablename__ = "categories"

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(50), nullable=False)

    posts: Mapped[list["Post"]] = relationship(
        primaryjoin="Category.id == foreign(Post.category_id)",
        back_populates="category",
    )

    def __repr__(self) -> str:
        return f"<Category(id={self.id}, name='{self.name}')>"

class Post(Base):
    __tablename__ = "posts"

    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(100), nullable=False)
    content: Mapped[str] = mapped_column(String(1000))
    category_id: Mapped[int] = mapped_column()

    category: Mapped[Category] = relationship(
        primaryjoin="Post.category_id == Category.id",
        foreign_keys=[category_id],
        back_populates="posts",
    )

engine = create_engine("sqlite:///blog2.db")
...

在Category中的posts属性定义使用了 primaryjoin参数，显式指定两个模型的join规则。
Post.category_id 不再是外键
在 Category.posts 和 Post.category 两个字段中，都指定了join关系和反向引用。
在两个relationship函数中，前者使用了 "Category.id == foreign(Post.category_id)" 字符串, 后者使用了独立参数 foreign_keys=[category_id] 。二者的效果是一样的，这里故意用两种方法只是为了展示多种用法。foreign_keys参数的值也可以是字符串。

其它代码不用变，执行效果和使用外键一样，符合预期。

$ python main2.py 
<Category(id=1, name='Technology')>
----------
Python ORM
Advanced Python

检查表结构，注意 posts.category_id 不再是外键了

$ sqlite3 blog2.db 
sqlite> .schema 
CREATE TABLE categories (
    id INTEGER NOT NULL, 
    name VARCHAR(50) NOT NULL, 
    PRIMARY KEY (id)
);
CREATE TABLE posts (
    id INTEGER NOT NULL, 
    title VARCHAR(100) NOT NULL, 
    content VARCHAR(1000), 
    category_id INTEGER, 
    PRIMARY KEY (id)
);

数据库里没有外键，而是在ORM层实现了类似外键的效果，底层实际使用了join的方法将两个表关联起来。由于没有外键，所以更需要注意在业务代码中确保数据的一致性。

基于ORM的单向关系查询

之前是双向的查询，这里顺带也看看单向查询的例子，在这个例子中，单向查询的含义是，只能查询文章的分类，不能通过分类获取分类下的文章列表。

class Category(Base):
    __tablename__ = "categories"

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(50), nullable=False)

    def __repr__(self) -> str:
        return f"<Category(id={self.id}, name='{self.name}')>"

class Post(Base):
    __tablename__ = "posts"

    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(100), nullable=False)
    content: Mapped[str] = mapped_column(String(1000))
    category_id: Mapped[int] = mapped_column()

    category: Mapped[Category] = relationship(
        primaryjoin="foreign(Post.category_id) == Category.id",
    )

与之前代码的差别：

Category中，删除了 posts 属性
Post中，relationship 函数里，删除了 back_populates 参数，因为不需要反向映射了。

执行代码可以发现，post.category 依旧能获取到文章下的分类信息，但显然已经无法执行Category.post 了。

基于ORM的多对多关联表查询

除了上面分类和文章的一对多场景，常见的数据关系还有多对多关系，这里以文章和标签为例：一篇文章有多个标签，一个标签关联到多篇文章。这种场景一般会引入一张 relationships 的表，记录文章和标签的对应关系。模型定义如下：

class Relationship(Base):
    __tablename__ = "relationships"  # 新增relationships表

    tag_id: Mapped[int] = mapped_column(primary_key=True)
    post_id: Mapped[int] = mapped_column(primary_key=True)

class Tag(Base):
    __tablename__ = "tags"

    id: Mapped[int] = mapped_column(primary_key=True)
    name: Mapped[str] = mapped_column(String(50), nullable=False)

    posts: Mapped[list["Post"]] = relationship(
        secondary="relationships",
        primaryjoin="Tag.id == Relationship.tag_id",
        secondaryjoin="Relationship.post_id == Post.id",
        back_populates="tags",
    )

    def __repr__(self):
        return f"<Tag(id={self.id}, name='{self.name}')>"


class Post(Base):
    __tablename__ = "posts"

    id: Mapped[int] = mapped_column(primary_key=True)
    title: Mapped[str] = mapped_column(String(100), nullable=False)
    content: Mapped[str] = mapped_column(String(1000))

    tags: Mapped[list[Tag]] = relationship(
        secondary="relationships",
        primaryjoin="Post.id == Relationship.post_id",
        secondaryjoin="Tag.id == Relationship.tag_id",
        back_populates="posts",
        uselist=True,
    )

这里 relationship 函数的参数多了一些：

定义了 relationships 表，表中post_id和tag_id两个字段分别是文章id和标签的id。
在 Tag 类的 posts 属性中，relationship 函数新引入了secondary 和 secondaryjoin 参数，用于把 relationships 表加入联表查询，实现3张表的联动。注意 secondary 的值必须是字符串格式的表名，而不能是映射的类名。
在 primaryjoin 和 secondaryjoin中共执行了两次join操作，将3张表关联起来。
Post类中，relationship的用法和Tag 中一样。

创建表和数据，验证效果。

engine = create_engine("sqlite:///blog4.db")
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()

# 添加3个分类
python = Tag(name="Python")
linux = Tag(name="Linux")
devops = Tag(name="devops")
session.add_all([python, linux, devops])
session.commit()

# 创建3篇文章
post1 = Post(title="Python ORM", content="SQLAlchemy tutorial")
post2 = Post(title="Build Python", content="Build Python on linux")
post3 = Post(title="Drone CI", content="Deploy app with Drone on linux")
session.add_all([post1, post2, post3])
session.commit()

# 将文章和标签进行关联
session.add_all(
    [
        Relationship(post_id=post1.id, tag_id=python.id),
        Relationship(post_id=post2.id, tag_id=python.id),
        Relationship(post_id=post2.id, tag_id=linux.id),
        Relationship(post_id=post3.id, tag_id=devops.id),
        Relationship(post_id=post3.id, tag_id=linux.id),
    ]
)
session.commit()

# 获取文章的标签
print(post2.tags)
print("-" * 10)

# 通过标签获取所有文章
for post in linux.posts:
    print(post.title)

执行结果，成功获得文章包含的tag，以及打了对应tag的文章：

$ python main4.py
[<Tag(id=1, name='Python')>, <Tag(id=2, name='Linux')>]
----------
Build Python
Drone CI

查看表结构，也没有外键。

$ echo '.schema' | sqlite3 blog4.db
CREATE TABLE relationships (
    tag_id INTEGER NOT NULL, 
    post_id INTEGER NOT NULL, 
    PRIMARY KEY (tag_id, post_id)
);
CREATE TABLE tags (
    id INTEGER NOT NULL, 
    name VARCHAR(50) NOT NULL, 
    PRIMARY KEY (id)
);
CREATE TABLE posts (
    id INTEGER NOT NULL, 
    title VARCHAR(100) NOT NULL, 
    content VARCHAR(1000), 
    PRIMARY KEY (id)
);

数据库里的 relationships 表使用了联合主键。

总结

使用外键可以简化SQLAlchemy模型类的定义，方便代码中进行数据查询。
但外键也可能带来性能损失、增加运维管理复杂度，因此许多公司禁用了数据库外键。
通过SQLAlchemy 的relationship函数提供的方法，可以在ORM层定义模型关联方法。实现一对多和多对多的联表查询。
在使用SQLALchemy 提供的模型关联方法时，要注意在代码层确保数据的一致性。

昔我往矣

SQLAlchemy中的联表查询

笨办法：基于property装饰器

基于外键的一对多双向查询方案

基于ORM的双向关联查询

基于ORM的单向关系查询

基于ORM的多对多关联表查询

总结