sqlmodel icon indicating copy to clipboard operation
sqlmodel copied to clipboard

How to Initialise & Populate a Postgres Database with Circular ForeignKeys?

Open AyrtonB opened this issue 2 years ago • 2 comments

First Check

  • [X] I added a very descriptive title to this issue.
  • [X] I used the GitHub search to find a similar issue and didn't find it.
  • [X] I searched the SQLModel documentation, with the integrated search.
  • [X] I already searched in Google "How to X in SQLModel" and didn't find any information.
  • [X] I already read and followed all the tutorial in the docs and didn't find an answer.
  • [X] I already checked if it is not related to SQLModel but to Pydantic.
  • [X] I already checked if it is not related to SQLModel but to SQLAlchemy.

Commit to Help

  • [X] I commit to help with one of those options 👆

Example Code

# Imports
from typing import Optional, List
from sqlmodel import Session, Field, SQLModel, Relationship, create_engine
import uuid as uuid_pkg


# Defining schemas
class Person(SQLModel, table=True):
    person_id: uuid_pkg.UUID = Field(default_factory=uuid_pkg.uuid4, primary_key=True, index=True, nullable=True)
    first_names: str
    last_name: str
    mailing_property_id: uuid_pkg.UUID = Field(foreign_key='property.property_id')
    customer: Optional['Customer'] = Relationship(back_populates='lead_person')
    mailing_property: Optional['Property'] = Relationship(back_populates='person')

class Customer(SQLModel, table=True):
    customer_id: uuid_pkg.UUID = Field(default_factory=uuid_pkg.uuid4, primary_key=True, index=True, nullable=True)
    lead_person_id: uuid_pkg.UUID = Field(foreign_key='person.person_id')
    contract_type: str
    lead_person: Optional['Person'] = Relationship(back_populates='customer')
    contracted_properties: Optional[List['Property']] = Relationship(back_populates='occupant_customer')
    
class Property(SQLModel, table=True):
    property_id: uuid_pkg.UUID = Field(default_factory=uuid_pkg.uuid4, primary_key=True, index=True, nullable=True)
    occupant_customer_id: uuid_pkg.UUID = Field(foreign_key='customer.customer_id')
    address: str
    person: Optional['Person'] = Relationship(back_populates='mailing_property')
    occupant_customer: Optional['Customer'] = Relationship(back_populates='contracted_properties')
	

# Initialising the database
engine = create_engine(f'postgresql://{DB_USERNAME}:{DB_PASSWORD}@{DB_URL}:{DB_PORT}/{DB_NAME}')
SQLModel.metadata.create_all(engine)


# Defining the database entries
john = Person(
    person_id = 'eb7a0f5d-e09b-4b36-8e15-e9541ea7bd6e',
    first_names = 'John',
    last_name = 'Smith',
    mailing_property_id = '4d6aed8d-d1a2-4152-ae4b-662baddcbef4'
)

johns_lettings = Customer(
    customer_id = 'cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4',
    lead_person_id = 'eb7a0f5d-e09b-4b36-8e15-e9541ea7bd6e',
    contract_type = 'Landlord Premium'
)

johns_property_1 = Property(
    property_id = '4d6aed8d-d1a2-4152-ae4b-662baddcbef4',
    occupant_customer_id = 'cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4',
    address = '123 High Street'
)

johns_property_2 = Property(
    property_id = '2ac15ac9-9ab3-4a7c-80ad-961dd565ab0a',
    occupant_customer_id = 'cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4',
    address = '456 High Street'
)


# Committing the database entries
with Session(engine) as session:
    session.add(john)
    session.add(johns_lettings)
    session.add(johns_property_1)
    session.add(johns_property_2)
    session.commit()

Description

Goal: To model the back-end database for a cleaning company. Specifically, trying to model a system where customers can have multiple properties that need to be cleaned and each customer has a single lead person who has a single mailing property (to contact them at). Ideally, I want to be able to use a single table for the mailing properties and cleaning properties (as in most instances they will be the same).

Constraints:

  • Customers can be either individual people or organisations
  • A lead person must be identifiable for each customer
  • Each person must be matched to a property (so that their mailing address can be identified)
  • A single customer can have multiple properties attached to them (e.g. for a landlord that includes cleaning as part of the rent)

The issue is that the foreign keys have a circular dependency.

  • Customer -> Person based on the lead_person_id
  • Person -> Property based on the mailing_property_id
  • Property -> Customer based on the occupant_customer_id

image

Running the code written above results in:

ForeignKeyViolation: insert or update on table "customer" violates foreign key constraint "customer_lead_person_id_fkey"
DETAIL:  Key (lead_person_id)=(eb7a0f5d-e09b-4b36-8e15-e9541ea7bd6e) is not present in table "person".

This issue is specific to Postgres, which unlike SQLite (used in the docs) imposes constraints on foreign keys when data is being added. I.e. replacing engine = create_engine(f'postgresql://{DB_USERNAME}:{DB_PASSWORD}@{DB_URL}:{DB_PORT}/{DB_NAME}') with engine = create_engine('sqlite:///test.db') will let the database be initialised without causing an error - however my use-case is with a Postgres DB.


Attempted Solutions:

  • Used link tables between customers/people and properties/customers - no luck
  • Used Session.exec with this code from SO to temporarily remove foreign key constraints then add them back on - no luck
  • Used primary joins instead of foreign keys as described in this SQLModel Issue - no luck

Operating System

macOS

Operating System Details

Using an M1 Mac but have replicated the issue on ubuntu as well

SQLModel Version

0.0.6

Python Version

3.10.4

Additional Context

No response

AyrtonB avatar Apr 25 '22 10:04 AyrtonB

You can disable all triggers by running SET session_replication_role = replica; as part of your transaction (session), this will allow you to load. Alternatively, you can disable all foreign keys and indexes at start of transaction, load data, and then re-enable. This will also increase performance for load. (Don't do this for anything other than initial loads to circumvent these issues, you lose all integrity)

I use a middleware layer for creating transactions for initial loads, which handles dialect, though I ended up using psycopg2 and postgresql's copy instead of ORM for insert due to performance, cutting loads of 5 mil rows from +1 hour to few minutes.

mkarbo avatar May 02 '22 18:05 mkarbo

The issue lies mostly with how the tables need to be defined in PG. Circular foreign keys in PG they need to be initially deferred, or else you run into the issue of them not being defined like you see.

Using Field(foreign_key="") is just a baseline sqlalchemy.ForeignKey under the hood with no extra args. I believe you need to go back to defining the Column to do that, based on what I see in the code.

The following worked for me with the original second half:

import uuid
from typing import Optional, List

from pydantic import UUID4
from sqlmodel import Session, Field, SQLModel, Relationship, create_engine
from sqlalchemy import Column, ForeignKey
from sqlalchemy.dialects.postgresql import UUID


# Defining schemas
class Person(SQLModel, table=True):
    person_id: UUID4 = Field(
        default_factory=uuid.uuid4, primary_key=True, index=True, nullable=True
    )
    first_names: str
    last_name: str
    mailing_property_id: UUID4 = Field(
        sa_column=Column(
            UUID(as_uuid=True),
            ForeignKey(
                "property.property_id",
                onupdate="CASCADE",
                ondelete="RESTRICT",
                initially="DEFERRED",
                deferrable=True,
            ),
        )
    )

    customer: Optional["Customer"] = Relationship(back_populates="lead_person")
    mailing_property: Optional["Property"] = Relationship(back_populates="person")


class Customer(SQLModel, table=True):
    customer_id: UUID4 = Field(
        default_factory=uuid.uuid4, primary_key=True, index=True, nullable=True
    )
    lead_person_id: UUID4 = Field(
        sa_column=Column(
            UUID(as_uuid=True),
            ForeignKey(
                "person.person_id",
                onupdate="CASCADE",
                ondelete="RESTRICT",
                initially="DEFERRED",
                deferrable=True,
            ),
        )
    )
    contract_type: str

    lead_person: Optional["Person"] = Relationship(back_populates="customer")
    contracted_properties: Optional[List["Property"]] = Relationship(
        back_populates="occupant_customer"
    )


class Property(SQLModel, table=True):
    property_id: UUID4 = Field(
        default_factory=uuid.uuid4, primary_key=True, index=True, nullable=True
    )
    occupant_customer_id: uuid.UUID = Field(
        sa_column=Column(
            UUID(as_uuid=True),
            ForeignKey(
                "customer.customer_id",
                onupdate="CASCADE",
                ondelete="RESTRICT",
                initially="DEFERRED",
                deferrable=True,
            ),
        )
    )
    address: str

    person: Optional["Person"] = Relationship(back_populates="mailing_property")
    occupant_customer: Optional["Customer"] = Relationship(
        back_populates="contracted_properties"
    )

The values you use for the ForeignKey may be different as far as the onupdate and ondelete go, but the initially="DEFERRED", deferrable=True are what is important here.

And in case you weren't aware, you can use the relationship instead of directly using IDs.

johns_lettings = Customer(
    customer_id="cb58199b-d7cf-4d94-a4ba-e7bb32f1cda4",
    lead_person=john,  # instead of `lead_person_id`
    contract_type="Landlord Premium",
)

They still need to be deferred foreign keys, so this doesn't fix the issue. And there is always one of them that has to use an ID since it is circular.

The way to get around that would be to use an intermediate table. Untested, but something along the lines of:

class MailingProperty(SQLModel, table=True):
    person_id: UUID4 = Field(foreign_key="person.person_id")
    property_id: UUID4 = Field(foreign_key="property.property_id")

    person: Person = Relationship(back_populates="mailing_property")
    property: Property = Relationship(back_populates="person")

Then you would just create the Person and the Property and MailingProperty(person=person, property=property) with a table inbetween it doesn't and up needing one of the IDs first. Still needs to be deferred, but can be a little cleaner.

eseglem avatar Aug 17 '22 03:08 eseglem