sqlmodel
sqlmodel copied to clipboard
How to Initialise & Populate a Postgres Database with Circular ForeignKeys?
First Check
- [X] I added a very descriptive title to this issue.
- [X] I used the GitHub search to find a similar issue and didn't find it.
- [X] I searched the SQLModel documentation, with the integrated search.
- [X] I already searched in Google "How to X in SQLModel" and didn't find any information.
- [X] I already read and followed all the tutorial in the docs and didn't find an answer.
- [X] I already checked if it is not related to SQLModel but to Pydantic.
- [X] I already checked if it is not related to SQLModel but to SQLAlchemy.
Commit to Help
- [X] I commit to help with one of those options 👆
Example Code
# Imports
from typing import Optional, List
from sqlmodel import Session, Field, SQLModel, Relationship, create_engine
import uuid as uuid_pkg
# Defining schemas
class Person(SQLModel, table=True):
person_id: uuid_pkg.UUID = Field(default_factory=uuid_pkg.uuid4, primary_key=True, index=True, nullable=True)
first_names: str
last_name: str
mailing_property_id: uuid_pkg.UUID = Field(foreign_key='property.property_id')
customer: Optional['Customer'] = Relationship(back_populates='lead_person')
mailing_property: Optional['Property'] = Relationship(back_populates='person')
class Customer(SQLModel, table=True):
customer_id: uuid_pkg.UUID = Field(default_factory=uuid_pkg.uuid4, primary_key=True, index=True, nullable=True)
lead_person_id: uuid_pkg.UUID = Field(foreign_key='person.person_id')
contract_type: str
lead_person: Optional['Person'] = Relationship(back_populates='customer')
contracted_properties: Optional[List['Property']] = Relationship(back_populates='occupant_customer')
class Property(SQLModel, table=True):
property_id: uuid_pkg.UUID = Field(default_factory=uuid_pkg.uuid4, primary_key=True, index=True, nullable=True)
occupant_customer_id: uuid_pkg.UUID = Field(foreign_key='customer.customer_id')
address: str
person: Optional['Person'] = Relationship(back_populates='mailing_property')
occupant_customer: Optional['Customer'] = Relationship(back_populates='contracted_properties')
# Initialising the database
engine = create_engine(f'postgresql://{DB_USERNAME}:{DB_PASSWORD}@{DB_URL}:{DB_PORT}/{DB_NAME}')
SQLModel.metadata.create_all(engine)
# Defining the database entries
john = Person(
person_id = 'eb7a0f5d-e09b-4b36-8e15-e9541ea7bd6e',
first_names = 'John',
last_name = 'Smith',
mailing_property_id = '4d6aed8d-d1a2-4152-ae4b-662baddcbef4'
)
johns_lettings = Customer(
customer_id = 'cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4',
lead_person_id = 'eb7a0f5d-e09b-4b36-8e15-e9541ea7bd6e',
contract_type = 'Landlord Premium'
)
johns_property_1 = Property(
property_id = '4d6aed8d-d1a2-4152-ae4b-662baddcbef4',
occupant_customer_id = 'cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4',
address = '123 High Street'
)
johns_property_2 = Property(
property_id = '2ac15ac9-9ab3-4a7c-80ad-961dd565ab0a',
occupant_customer_id = 'cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4',
address = '456 High Street'
)
# Committing the database entries
with Session(engine) as session:
session.add(john)
session.add(johns_lettings)
session.add(johns_property_1)
session.add(johns_property_2)
session.commit()
Description
Goal: To model the back-end database for a cleaning company. Specifically, trying to model a system where customers can have multiple properties that need to be cleaned and each customer has a single lead person who has a single mailing property (to contact them at). Ideally, I want to be able to use a single table for the mailing properties and cleaning properties (as in most instances they will be the same).
Constraints:
- Customers can be either individual people or organisations
- A lead person must be identifiable for each customer
- Each person must be matched to a property (so that their mailing address can be identified)
- A single customer can have multiple properties attached to them (e.g. for a landlord that includes cleaning as part of the rent)
The issue is that the foreign keys have a circular dependency.
- Customer -> Person based on the
lead_person_id
- Person -> Property based on the
mailing_property_id
- Property -> Customer based on the
occupant_customer_id
Running the code written above results in:
ForeignKeyViolation: insert or update on table "customer" violates foreign key constraint "customer_lead_person_id_fkey"
DETAIL: Key (lead_person_id)=(eb7a0f5d-e09b-4b36-8e15-e9541ea7bd6e) is not present in table "person".
This issue is specific to Postgres, which unlike SQLite (used in the docs) imposes constraints on foreign keys when data is being added. I.e. replacing engine = create_engine(f'postgresql://{DB_USERNAME}:{DB_PASSWORD}@{DB_URL}:{DB_PORT}/{DB_NAME}')
with engine = create_engine('sqlite:///test.db')
will let the database be initialised without causing an error - however my use-case is with a Postgres DB.
Attempted Solutions:
- Used link tables between customers/people and properties/customers - no luck
- Used
Session.exec
with this code from SO to temporarily remove foreign key constraints then add them back on - no luck - Used primary joins instead of foreign keys as described in this SQLModel Issue - no luck
Operating System
macOS
Operating System Details
Using an M1 Mac but have replicated the issue on ubuntu as well
SQLModel Version
0.0.6
Python Version
3.10.4
Additional Context
No response
You can disable all triggers by running SET session_replication_role = replica;
as part of your transaction (session), this will allow you to load. Alternatively, you can disable all foreign keys and indexes at start of transaction, load data, and then re-enable. This will also increase performance for load. (Don't do this for anything other than initial loads to circumvent these issues, you lose all integrity)
I use a middleware layer for creating transactions for initial loads, which handles dialect, though I ended up using psycopg2 and postgresql's copy
instead of ORM for insert due to performance, cutting loads of 5 mil rows from +1 hour to few minutes.
The issue lies mostly with how the tables need to be defined in PG. Circular foreign keys in PG they need to be initially deferred, or else you run into the issue of them not being defined like you see.
Using Field(foreign_key="")
is just a baseline sqlalchemy.ForeignKey
under the hood with no extra args. I believe you need to go back to defining the Column
to do that, based on what I see in the code.
The following worked for me with the original second half:
import uuid
from typing import Optional, List
from pydantic import UUID4
from sqlmodel import Session, Field, SQLModel, Relationship, create_engine
from sqlalchemy import Column, ForeignKey
from sqlalchemy.dialects.postgresql import UUID
# Defining schemas
class Person(SQLModel, table=True):
person_id: UUID4 = Field(
default_factory=uuid.uuid4, primary_key=True, index=True, nullable=True
)
first_names: str
last_name: str
mailing_property_id: UUID4 = Field(
sa_column=Column(
UUID(as_uuid=True),
ForeignKey(
"property.property_id",
onupdate="CASCADE",
ondelete="RESTRICT",
initially="DEFERRED",
deferrable=True,
),
)
)
customer: Optional["Customer"] = Relationship(back_populates="lead_person")
mailing_property: Optional["Property"] = Relationship(back_populates="person")
class Customer(SQLModel, table=True):
customer_id: UUID4 = Field(
default_factory=uuid.uuid4, primary_key=True, index=True, nullable=True
)
lead_person_id: UUID4 = Field(
sa_column=Column(
UUID(as_uuid=True),
ForeignKey(
"person.person_id",
onupdate="CASCADE",
ondelete="RESTRICT",
initially="DEFERRED",
deferrable=True,
),
)
)
contract_type: str
lead_person: Optional["Person"] = Relationship(back_populates="customer")
contracted_properties: Optional[List["Property"]] = Relationship(
back_populates="occupant_customer"
)
class Property(SQLModel, table=True):
property_id: UUID4 = Field(
default_factory=uuid.uuid4, primary_key=True, index=True, nullable=True
)
occupant_customer_id: uuid.UUID = Field(
sa_column=Column(
UUID(as_uuid=True),
ForeignKey(
"customer.customer_id",
onupdate="CASCADE",
ondelete="RESTRICT",
initially="DEFERRED",
deferrable=True,
),
)
)
address: str
person: Optional["Person"] = Relationship(back_populates="mailing_property")
occupant_customer: Optional["Customer"] = Relationship(
back_populates="contracted_properties"
)
The values you use for the ForeignKey
may be different as far as the onupdate
and ondelete
go, but the initially="DEFERRED", deferrable=True
are what is important here.
And in case you weren't aware, you can use the relationship instead of directly using IDs.
johns_lettings = Customer(
customer_id="cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4",
lead_person=john, # instead of `lead_person_id`
contract_type="Landlord Premium",
)
They still need to be deferred foreign keys, so this doesn't fix the issue. And there is always one of them that has to use an ID since it is circular.
The way to get around that would be to use an intermediate table. Untested, but something along the lines of:
class MailingProperty(SQLModel, table=True):
person_id: UUID4 = Field(foreign_key="person.person_id")
property_id: UUID4 = Field(foreign_key="property.property_id")
person: Person = Relationship(back_populates="mailing_property")
property: Property = Relationship(back_populates="person")
Then you would just create the Person
and the Property
and MailingProperty(person=person, property=property)
with a table inbetween it doesn't and up needing one of the IDs first. Still needs to be deferred, but can be a little cleaner.