sqlmodel icon indicating copy to clipboard operation
sqlmodel copied to clipboard

✨ Add `PydanticJSONB` TypeDecorator for Automatic Pydantic Model Serialization in SQLModel

Open amanmibra opened this issue 8 months ago • 45 comments

Description

This PR introduces a PydanticJSONB to SQLModel, enabling seamless serialization and deserialization of Pydantic models in JSONB columns. This removes the need for manual conversion, allowing SQLModel instances to work directly with Pydantic objects.

Why? Storing Pydantic models in JSONB columns has been a recurring challenge. This PR solves that by automating conversion between Pydantic models and JSON fields.

How?

  • On Save: Converts Pydantic models to dictionaries before storing in JSONB.
  • On Load: Automatically instantiates the Pydantic model from the stored JSON.

Benefits ✅ Eliminates manual conversion – No need to wrap dict(**org.config) manually. ✅ Ensures structured storage – Enforces Pydantic validation automatically. ✅ Improves dev experience – Seamless interaction with JSONB fields in SQLModel.

Example Usage of PydanticJSONB in SQLModel

With this PR, you can now store and retrieve Pydantic models in JSONB fields effortlessly.

Define a Pydantic Model


from pydantic import BaseModel
from sqlmodel import SQLModel, Field
from sqlalchemy import Column
from uuid import UUID
from typing import Optional
from deeptrust.database.types import PydanticJSONB  # Assuming this PR introduces it

# Define a Pydantic model for structured storage
class OrgConfig(BaseModel):
    custom_join_message: Optional[str] = None
    custom_bot_name: Optional[str] = None
    retain_period: int = 86400

# Define a SQLModel with a JSONB column that stores the Pydantic model
class Org(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    propel_id: UUID
    name: str
    config: OrgConfig = Field(
        sa_column=Column(PydanticJSONB(OrgConfig)),  # Auto-serializes/deserializes
        default_factory=OrgConfig
    )

Create & Store Data

from sqlmodel import Session, create_engine

engine = create_engine("postgresql://user:password@localhost/db")

# Insert an Org with a Pydantic config
with Session(engine) as session:
    org = Org(
        propel_id=UUID("123e4567-e89b-12d3-a456-426614174000"),
        name="DeepTrust",
        config=OrgConfig(custom_bot_name="TrustBot")
    )
    session.add(org)
    session.commit()

Retrieve & Use Data

with Session(engine) as session:
    org = session.query(Org).first()
    print(org.config.custom_bot_name)  # "TrustBot"
    print(type(org.config))  # <class '__main__.OrgConfig'>

Result: ✅ No need for OrgConfig(**org.config) – it's already a OrgConfig instance! ✅ Automatic conversion between JSONB and Pydantic models.

This simplifies handling structured configurations in SQLModel, making JSONB storage seamless and ergonomic. 🚀

Related Issues & Discussions

SQLModel Issue #63 – Nested Pydantic models in JSON fields. Stack Overflow: Writing Pydantic objects into SQLAlchemy JSON columns GitHub Gist Example – Community implementation of Pydantic models in JSON columns. This PR makes SQLModel more ergonomic for JSONB storage while maintaining compatibility with SQLAlchemy and Pydantic. 🚀

TODO

Per contribution rules: https://sqlmodel.tiangolo.com/help/#create-a-pull-request

This PR still needs:

  1. Unit Tests
  2. Documentation

Before I do that, I would love to hear your thoughts @tiangolo!

amanmibra avatar Mar 17 '25 00:03 amanmibra

I like this approach. There seem to me at the moment 2 things that would still need to be added.

  1. Support for e.g. some list of models?
  2. Currently you implemented model_dump, if we would use model_dump with mode="json", UUID, datetimes etc would correctly serialize.
  3. parse_obj is deprecated. Would we want to use model_validate? Or construct the model without another round of validation?

What about the following?

from typing import Any, Type, TypeVar, get_args

from pydantic import BaseModel
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import JSONB  # for Postgres JSONB

BaseModelType = TypeVar("BaseModelType", bound=BaseModel)


class PydanticJSONB(types.TypeDecorator):  # type: ignore
    """Custom type to automatically handle Pydantic model serialization."""

    impl = JSONB  # use JSONB type in Postgres (fallback to JSON for others)
    cache_ok = True  # allow SQLAlchemy to cache results

    def __init__(self, model_class: Type[BaseModel] | Type[list[BaseModelType]], *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.model_class = model_class  # Pydantic model class to use

    def process_bind_param(self, value: Any, dialect) -> list[dict] | dict | None:  # noqa: ANN401, ARG002, ANN001
        if isinstance(value, BaseModel):
            return value.model_dump(mode="json")
        if isinstance(value, list):
            return [m.model_dump(mode="json") for m in value]

        return value

    def process_result_value(self, value: Any, dialect) -> list[BaseModel] | BaseModel | None:  # noqa: ANN401, ARG002, ANN001
        # Called when loading from DB: convert dict to Pydantic model instance
        if isinstance(value, dict):
            return self.model_class.model_validate(value)  # type: ignore
        if isinstance(value, list):
            return [get_args(self.model_class)[0].model_validate(v) for v in value]

        return None

DaanRademaker avatar Mar 17 '25 08:03 DaanRademaker

+1 from @DaanRademaker 's comment. This PydanticJSONB implementation needs to support List[MODEL], Dict[str, MODEL] and more, as well as serializing and deserializing them, and handling correctly the flag_modified when doing a direct assignment or mutations !

Seluj78 avatar Mar 17 '25 17:03 Seluj78

Dict[str, MODEL]

@Seluj78 Should we support this? What is a use case where someone would define the type as a dictionary of Pydantic models instead of making the dictionary itself a Pydantic model?

I hesitate because when process_result_value returns a dict, additional logic is needed to distinguish between a basic dictionary and a structured collection of Pydantic models. This could introduce ambiguity or unexpected behavior in model validation and serialization.

amanmibra avatar Mar 17 '25 21:03 amanmibra

Actually, maybe the key signifier can come from the self.model_class. Let me try something and push my changes.

amanmibra avatar Mar 17 '25 21:03 amanmibra

I see what you mean. I was trying to avoid an extra step by just having my_column: Dict[key, PydanticModelValue] = ...

The main problem you'd need to tacle anyway, no matter if you choose to support this or not is the mutability and assignment detection. It was a nightmare to try and get working and I failed on my end

Seluj78 avatar Mar 17 '25 21:03 Seluj78

I see what you mean. I was trying to avoid an extra step by just having my_column: Dict[key, PydanticModelValue] = ...

The main problem you'd need to tacle anyway, no matter if you choose to support this or not is the mutability and assignment detection. It was a nightmare to try and get working and I failed on my end

No, it actually makes sense, sorry, I had to think about for a minute.

Take a look at my most recent commit. I manage it by, just like with list, having the user explicitly name the key-value pair b/n the str and BaseModel at initialization.

If you want, throw this at the bottom and run python sqltypes.py:

if __name__ == "__main__":
    from typing import Dict, List

    from pydantic import BaseModel
    from sqlalchemy import Column

    from sqlmodel import Field, SQLModel  # Import Field
    from sqlmodel.sql.sqltypes import PydanticJSONB

    # Define some Pydantic models
    class Address(BaseModel):
        street: str
        city: str

    class User(BaseModel):
        name: str
        age: int

    # 1. Single Model Example
    class PersonTable(SQLModel, table=True):
        id: int | None = Field(default=None, primary_key=True)
        # Use Field instead of Column, and wrap Address with PydanticJSONB
        address: Address = Field(sa_column=Column(PydanticJSONB(Address)))

    # 2. List of Models Example
    class TeamTable(SQLModel, table=True):
        id: int | None = Field(default=None, primary_key=True)
        members: List[User] = Field(sa_column=Column(PydanticJSONB(List[User])))

    # 3. Dictionary of Models Example
    class CompanyTable(SQLModel, table=True):
        id: int | None = Field(default=None, primary_key=True)
        employees: Dict[str, User] = Field(
            sa_column=Column(PydanticJSONB(Dict[str, User]))
        )

    # Test instances
    person = PersonTable(address=Address(street="123 Main St", city="Boston"))
    print("Person:")
    print(person)
    team = TeamTable(members=[User(name="Alice", age=30), User(name="Bob", age=25)])
    print("\nTeam:")
    print(team)
    company = CompanyTable(
        employees={"alice": User(name="Alice", age=30), "bob": User(name="Bob", age=25)}
    )
    print("\nCompany:")
    print(company)



amanmibra avatar Mar 17 '25 22:03 amanmibra

Nice improvements!

I think there is 1 more interesting usecase.

Let's say you would use the insert statement from the postgres dialect, the process bind param will be called with a dictionary that might not be jsonable.

It would probably be useful to convert any dictionary to jsonable using pydantic to_jsonable function. I added my usecase in the code below.

from datetime import date
from decimal import Decimal
from typing import Any, Dict, List, Type, TypeVar, get_args
from uuid import UUID

from pydantic import BaseModel
from pydantic_core import to_jsonable_python
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import (
    JSONB,  # for Postgres JSONB
    insert,
)
from sqlmodel import Field, SQLModel

BaseModelType = TypeVar("BaseModelType", bound=BaseModel)


# Add to_jsonable_python
class PydanticJSONB(types.TypeDecorator):  # type: ignore
    """Custom type to automatically handle Pydantic model serialization."""

    impl = JSONB  # use JSONB type in Postgres (fallback to JSON for others)
    cache_ok = True  # allow SQLAlchemy to cache results

    def __init__(
        self,
        model_class: Type[BaseModelType] | Type[list[BaseModelType]] | Type[Dict[str, BaseModelType]],
        *args,
        **kwargs,
    ):
        super().__init__(*args, **kwargs)
        self.model_class = model_class  # Pydantic model class to use

    def process_bind_param(self, value: Any, dialect) -> dict | list[dict] | None:  # noqa: ANN401, ARG002, ANN001
        if value is None:
            return None
        if isinstance(value, BaseModel):
            return value.model_dump(mode="json")
        if isinstance(value, list):
            return [m.model_dump(mode="json") if isinstance(m, BaseModel) else to_jsonable_python(m) for m in value]
        if isinstance(value, dict):
            return {
                k: v.model_dump(mode="json") if isinstance(v, BaseModel) else to_jsonable_python(v)
                for k, v in value.items()
            }
        return to_jsonable_python(value)

    def process_result_value(
        self, value: Any, dialect
    ) -> BaseModelType | List[BaseModelType] | Dict[str, BaseModelType] | None:
        if value is None:
            return None
        if isinstance(value, dict):
            # If model_class is a Dict type hint, handle key-value pairs
            if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is dict:
                model_class = get_args(self.model_class)[1]  # Get the value type (the model)
                return {k: model_class.model_validate(v) for k, v in value.items()}
            # Regular case: the whole dict represents a single model
            return self.model_class.model_validate(value)  # type: ignore
        if isinstance(value, list):
            # If model_class is a List type hint
            if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is list:
                model_class = get_args(self.model_class)[0]
                return [model_class.model_validate(v) for v in value]
            # Fallback case (though this shouldn't happen given our __init__ types)
            return [self.model_class.model_validate(v) for v in value]  # type: ignore
        return value


class SomeNestedModel(BaseModel):
    some_decimal: Decimal


class SomeModel(SQLModel, table=True):
    some_uuid: UUID = Field(primary_key=True)
    some_date: date = Field(primary_key=True)
    some_nested: SomeNestedModel = Field(sa_column=Field(PydanticJSONB(SomeNestedModel)))


item_1 = SomeModel(
    some_uuid=UUID("123e4567-e89b-12d3-a456-426614174000"),
    some_date=date(2022, 1, 1),
    some_nested=SomeNestedModel(some_decimal=Decimal("1.23")),  #
)

# model_dump(mode="json") cannot be called because else fields like some_date and some_uuid are not correct type
# therefore added to_jsonable_python in process_bind_param
not_yet_jsonable_dict = item_1.model_dump()
stmt = insert(SomeModel).values(not_yet_jsonable_dict)

# statement cannot executed with jsonable dict because it loses datetime, uuid types etc
jsonable_dict = item_1.model_dump(mode="json")
stmt_2 = insert(SomeModel).values(jsonable_dict)

DaanRademaker avatar Mar 18 '25 08:03 DaanRademaker

# model_dump(mode="json") cannot be called because else fields like some_date and some_uuid are not correct type
# therefore added to_jsonable_python in process_bind_param
not_yet_jsonable_dict = item_1.model_dump()
stmt = insert(SomeModel).values(not_yet_jsonable_dict)

# statement cannot executed with jsonable dict because it loses datetime, uuid types etc
jsonable_dict = item_1.model_dump(mode="json")
stmt_2 = insert(SomeModel).values(jsonable_dict)

@DaanRademaker I am a little confused. How is PydanticJSONB going to solve serialization for non-PydanticJSONB columns?

Also, I tried your example (make sure your update the sa_column in some_nested attribute to hold Column and not Field), and it worked just fine.

    from datetime import date
    from decimal import Decimal
    from pydantic import BaseModel
    from uuid import UUID

    from sqlmodel import Column, Field, SQLModel

    class SomeNestedModel(BaseModel):
        some_decimal: Decimal

    class SomeModel(SQLModel, table=True):
        some_uuid: UUID = Field(primary_key=True)
        some_date: date = Field(primary_key=True)
        some_nested: SomeNestedModel = Field(
            sa_column=Column(PydanticJSONB(SomeNestedModel))
        )

    item_1 = SomeModel(
        some_uuid=UUID("123e4567-e89b-12d3-a456-426614174000"),
        some_date=date(2022, 1, 1),
        some_nested=SomeNestedModel(some_decimal=Decimal("1.23")),  #
    )

    print(item_1) # some_uuid=UUID('123e4567-e89b-12d3-a456-426614174000') some_date=datetime.date(2022, 1, 1) some_nested=SomeNestedModel(some_decimal=Decimal('1.23'))


amanmibra avatar Mar 19 '25 15:03 amanmibra

Hi all,

Thanks for the contribution and it's great to see this level of engagement! Just as a maintenance note, I'll put this in draft while the CI is failing. Feel free to mark as "Ready for review" when it's green!

svlandeg avatar Mar 19 '25 16:03 svlandeg

Hi all,

Thanks for the contribution and it's great to see this level of engagement! Just as a maintenance note, I'll put this in draft while the CI is failing. Feel free to mark as "Ready for review" when it's green!

🫡

amanmibra avatar Mar 21 '25 05:03 amanmibra

I took some time to create a fully reproducible example to show the error I am getting. The issue seems to be with difference between using a sync adapter (psycopg2) or async adapter (asyncpg). Asyncpg cannot deal with already jsonable python dict before hitting the process_bind_param function.

It seems asyncpg does stricter type handling during parameter substitution and does not convert to the correct type as psycopg2 does.

By adding the to_jsonable_python in the process_bind_param function we are able to pass an not yet jsonable dict. Which resolves this issue with asyncpg.

async def test_reproducable_example():
    from datetime import date, datetime
    from decimal import Decimal
    from typing import Any, Dict, List, Type, TypeVar, get_args
    from uuid import UUID

    from pydantic import BaseModel
    from sqlalchemy import types
    from sqlalchemy.dialects.postgresql import (
        JSONB,  # for Postgres JSONB
        insert,
    )
    from sqlmodel import Column, Field, Session, SQLModel, create_engine, select
    from sqlmodel.ext.asyncio.session import AsyncSession
    from sqlalchemy.ext.asyncio import create_async_engine

    BaseModelType = TypeVar("BaseModelType", bound=BaseModel)

    # fill in your database connection details here
    postgres_url = "postgresql+psycopg2://user:password@localhost:5432/dbname"
    postgres_url_async = "postgresql+asyncpg://user:password@localhost:5432/dbname"

    async_engine = create_async_engine(postgres_url_async)
    sync_engine = create_engine(postgres_url)

    class PydanticJSONB(types.TypeDecorator):  # type: ignore
        """Custom type to automatically handle Pydantic model serialization."""

        impl = JSONB  # use JSONB type in Postgres (fallback to JSON for others)
        cache_ok = True  # allow SQLAlchemy to cache results

        def __init__(
            self,
            model_class: Type[BaseModelType] | Type[list[BaseModelType]] | Type[Dict[str, BaseModelType]],
            *args,
            **kwargs,
        ):
            super().__init__(*args, **kwargs)
            self.model_class = model_class  # Pydantic model class to use

        def process_bind_param(self, value: Any, dialect) -> dict | list[dict] | None:  # noqa: ANN401, ARG002
            if value is None:
                return None
            if isinstance(value, BaseModel):
                return value.model_dump(mode="json")
            if isinstance(value, list):
                return [m.model_dump(mode="json") if isinstance(m, BaseModel) else m for m in value]
            if isinstance(value, dict):
                return {k: v.model_dump(mode="json") if isinstance(v, BaseModel) else v for k, v in value.items()}
            return value

        def process_result_value(
            self, value: Any, dialect
        ) -> BaseModelType | List[BaseModelType] | Dict[str, BaseModelType] | None:
            if value is None:
                return None
            if isinstance(value, dict):
                # If model_class is a Dict type hint, handle key-value pairs
                if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is dict:
                    model_class = get_args(self.model_class)[1]  # Get the value type (the model)
                    return {k: model_class.model_validate(v) for k, v in value.items()}
                # Regular case: the whole dict represents a single model
                return self.model_class.model_validate(value)  # type: ignore
            if isinstance(value, list):
                # If model_class is a List type hint
                if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is list:
                    model_class = get_args(self.model_class)[0]
                    return [model_class.model_validate(v) for v in value]
                # Fallback case (though this shouldn't happen given our __init__ types)
                return [self.model_class.model_validate(v) for v in value]  # type: ignore
            return value

    class SomeNestedModel(BaseModel):
        some_decimal: Decimal

    class SomeModel(SQLModel, table=True):
        some_uuid: UUID = Field(primary_key=True)
        some_date: date = Field(primary_key=True)
        some_nested: SomeNestedModel = Field(sa_column=Column(PydanticJSONB(SomeNestedModel)))

    SQLModel.metadata.create_all(sync_engine)

    item_1 = SomeModel(
        some_uuid=UUID("123e4567-e89b-12d3-a456-426614174000"),
        some_date=date(2022, 1, 1),
        some_nested=SomeNestedModel(some_decimal=Decimal("1.23")),
    )

    items_mode_json = [item_1.model_dump(mode="json")]
    items_non_json_mode = [item_1.model_dump()]

    stmt_json_mode = insert(SomeModel).values(items_mode_json)
    stmt_non_json_mode = insert(SomeModel).values(items_non_json_mode)

    # Psycopg2 engine causing no problems!
    with Session(sync_engine) as session:
        session.execute(stmt_json_mode)
        session.commit()

    # This fails with statement error, the PydanticJSONB type decorator process_bind_param is called
    # With a dict that does not have json serializable values (Decimal is not serializable)
    try:
        with Session(sync_engine) as session:
            session.execute(stmt_non_json_mode)
            session.commit()
    except Exception as e:
        print(e)

    try:
        # This fails with DBAPIError todordinal of the date type, asyncpg engine cannot deal with
        # Date values already being converted to string
        async with AsyncSession(async_engine) as session:
            await session.execute(stmt_json_mode)
            await session.commit()
    except Exception as e:
        print(e)

    try:
        # This fails with StatementError the PydanticJSONB type decorator process_bind_param is called
        # With a dict that does not have json serializable values (Decimal is not serializable)
        async with AsyncSession(async_engine) as session:
            await session.execute(stmt_non_json_mode)
            await session.commit()
    except Exception as e:
        print(e)

DaanRademaker avatar Mar 21 '25 07:03 DaanRademaker

@DaanRademaker Thanks for sending that example! Helped me understand the blocker you were presenting

Made the updates to include to_jsonable_python.

Only issue now is the linter is yelling at me due to the function returning Any, so I added an ignore.

amanmibra avatar Mar 22 '25 18:03 amanmibra

bump? @svlandeg

amanmibra avatar Mar 31 '25 14:03 amanmibra

Any advice in how will be used in a where clause? I think that the SQLModel is not Indexable to use in the JSONB query structure Model.JSONB_Column['JSON_PROPERTY']

AfroMonkey avatar Mar 31 '25 22:03 AfroMonkey

Im not sure if this is right, but it works for me

from functools import partial
from typing import Any, Dict, List, Optional, Type, TypeVar, Union, get_args, get_origin

from pydantic import BaseModel
from pydantic_core import to_jsonable_python
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.sql.operators import ColumnOperators
from sqlalchemy.sql.type_api import _ComparatorFactory

BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
JSONValue = Union[Dict[str, Any], List[Any], str, int, float, bool, None]


class PydanticJSONBComparator(JSONB.Comparator):
    def __init__(self, *args, **kwargs):
        self.model_class = kwargs.pop("model_class", None)
        return super().__init__(*args, **kwargs)

    def __getattr__(self, name: Any) -> ColumnOperators:
        if name in self.model_class.model_fields:
            return self.__getitem__(name)
        raise AttributeError(
            f"'{self.__class__.__name__}' object has no attribute '{name}'"
            f" or '{name}' is not a valid field in {self.model_class.__name__}"
        )


# Define a type alias for JSON-serializable values
class PydanticJSONB(types.TypeDecorator):  # type: ignore
    """Custom type to automatically handle Pydantic model serialization with comparators."""

    impl = JSONB  # use JSONB type in Postgres (fallback to JSON for others)
    cache_ok = True  # allow SQLAlchemy to cache results

    def __init__(
        self,
        model_class: Union[
            Type[BaseModelType],
            Type[List[BaseModelType]],
            Type[Dict[str, BaseModelType]],
        ],
        *args: Any,
        **kwargs: Any,
    ):
        super().__init__(*args, **kwargs)
        self.model_class = model_class  # Pydantic model class to use

    @property
    def comparator_factory(self) -> _ComparatorFactory[Any]:  # type: ignore  # mypy properties bug
        """Override the comparator factory to use our custom comparator."""
        return partial(PydanticJSONBComparator, model_class=self.model_class)

    def process_bind_param(self, value: Any, dialect: Any) -> JSONValue:  # noqa: ANN401, ARG002, ANN001
        if value is None:
            return None
        if isinstance(value, BaseModel):
            return value.model_dump(mode="json")
        if isinstance(value, list):
            return [
                m.model_dump(mode="json")
                if isinstance(m, BaseModel)
                else to_jsonable_python(m)
                for m in value
            ]
        if isinstance(value, dict):
            return {
                k: v.model_dump(mode="json")
                if isinstance(v, BaseModel)
                else to_jsonable_python(v)
                for k, v in value.items()
            }

        # We know to_jsonable_python returns a JSON-serializable value, but mypy sees it as Any
        return to_jsonable_python(value)  # type: ignore[no-any-return]

    def process_result_value(
        self, value: Any, dialect: Any
    ) -> Optional[Union[BaseModelType, List[BaseModelType], Dict[str, BaseModelType]]]:  # noqa: ANN401, ARG002, ANN001
        if value is None:
            return None
        if isinstance(value, dict):
            # If model_class is a Dict type hint, handle key-value pairs
            origin = get_origin(self.model_class)
            if origin is dict:
                model_class = get_args(self.model_class)[
                    1
                ]  # Get the value type (the model)
                return {k: model_class.model_validate(v) for k, v in value.items()}
            # Regular case: the whole dict represents a single model
            return self.model_class.model_validate(value)  # type: ignore
        if isinstance(value, list):
            # If model_class is a List type hint
            origin = get_origin(self.model_class)
            if origin is list:
                model_class = get_args(self.model_class)[0]
                return [model_class.model_validate(v) for v in value]
            # Fallback case (though this shouldn't happen given our __init__ types)
            return [self.model_class.model_validate(v) for v in value]  # type: ignore

        raise TypeError(
            f"Unsupported type for PydanticJSONB from database: {type(value)}. Expected a dictionary or list."
        )

AfroMonkey avatar Mar 31 '25 23:03 AfroMonkey

Im not sure if this is right, but it works for me

from functools import partial
from typing import Any, Dict, List, Optional, Type, TypeVar, Union, get_args, get_origin

from pydantic import BaseModel
from pydantic_core import to_jsonable_python
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.sql.operators import ColumnOperators
from sqlalchemy.sql.type_api import _ComparatorFactory

BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
JSONValue = Union[Dict[str, Any], List[Any], str, int, float, bool, None]


class PydanticJSONBComparator(JSONB.Comparator):
    def __init__(self, *args, **kwargs):
        self.model_class = kwargs.pop("model_class", None)
        return super().__init__(*args, **kwargs)

    def __getattr__(self, name: Any) -> ColumnOperators:
        if name in self.model_class.model_fields:
            return self.__getitem__(name)
        raise AttributeError(
            f"'{self.__class__.__name__}' object has no attribute '{name}'"
            f" or '{name}' is not a valid field in {self.model_class.__name__}"
        )


Oh nice this is pretty cool!

amanmibra avatar Apr 01 '25 14:04 amanmibra

@svlandeg why is this marked as a draft again?

amanmibra avatar Apr 01 '25 14:04 amanmibra

@svlandeg why is this marked as a draft again?

Because the tests are red again?

svlandeg avatar Apr 01 '25 20:04 svlandeg

@svlandeg why is this marked as a draft again?

Because the tests are red again?

Oh strange. I didn't change anything and it was all green.

Tests were updated?

amanmibra avatar Apr 01 '25 20:04 amanmibra

@svlandeg fyi those failures have nothing to do with my changes

amanmibra avatar Apr 06 '25 19:04 amanmibra

Your main branch is failing, please check your house https://github.com/fastapi/sqlmodel/actions/runs/14167821533/job/39684751783

amanmibra avatar Apr 06 '25 19:04 amanmibra

@amanmibra: we're looking into it here - I will update your PR once the main tests are green again 🙏

svlandeg avatar Apr 09 '25 14:04 svlandeg

@svlandeg checking the status of the CI fix

amanmibra avatar Apr 24 '25 20:04 amanmibra

Tests look good to go! @svlandeg

amanmibra avatar May 02 '25 19:05 amanmibra

I want this feature and the code looks good to me. But the documentation is currently lacking (both on the developer side and on the user's guide side). I would like to see a docstring here that contains information on how the parameters are used.

As for the users guide, I think a new section under 'Advanced User Guide' should be created. Some text like (note this might be wrong as I may have misunderstood the API, please check!):

Pydantic Model Serialization (JSONB)

In some cases, you may wish to have a nested model inside your ORM, rather than using a relationship. This would be appropriate when you have, for instance, custom metadata about a row that is present in some cases and not in others (though note that metadata is a reserved keyword and can't be used as a field name!).

Thankfully, many databases (e.g. PostgreSQL through JSONB) explicitly include support for JSON types, and pydantic was built with JSON serialization in mind!

These concepts can be coupled together using the PydanticJSONB type.

from sqlmodel.sql.sqltypes import PydanticJSONB
from sqlmodel import SQLModel, Field
from pydantic import BaseModel
from typing import Any

class ExtraData(BaseModel):
    x: float
    y: str
    z: dict[str, Any]

class MyTable(SQLModel, table=True):
    id: int = Field(primary_key=True)
    data: ExtraData = Field(sa_type=PydanticJSONB(ExtraData))

To use this new table model, you simply need to pass data as an additional argument, and it must be of the ExtraData type:

row = MyTable(
    data=ExtraData(
        x=9.32,
        y="extra_data",
        z={
            "flag": True
        }
    )
)

Crucially, for database backends that support JSON fields, this will be converted to database-native JSON, and as such can be filtered and searched. At this time, SQLModel does not provide explicit support for searching these nested fields.

JBorrow avatar May 09 '25 14:05 JBorrow

I want this feature and the code looks good to me. But the documentation is currently lacking (both on the developer side and on the user's guide side). I would like to see a docstring here that contains information on how the parameters are used.

I’m glad to hear this feature is valuable to others 🙂 I'm a new contributor, so I’ll follow the contributing guide and aim to add proper docs soon, likely next weekend, to be honest.

There's also a follow-up improvement I’d like to propose in a separate PR once I’ve had time to build and test it.

Right now, like with plain dict in SQLAlchemy, updating attributes inside the nested model doesn’t trigger a database update. For example:

row = select(...)  # some MyTable row
row.data.x = 1
db.add(row)  # no effect, change isn’t detected

The only workaround for now is reassigning a new instance:

updated = ExtraData(**row.data.model_dump())
updated.x = 1
row.data = updated
db.add(row)

Using MutableDict could solve this by enabling change tracking for nested fields.

MutableDict object implements a dictionary that will emit change events to the underlying mapping when the contents of the dictionary are altered, including when values are added or removed.

amanmibra avatar May 10 '25 22:05 amanmibra

Been waiting for this feature! PR looks ready after extensive testing. When can we expect a merge? @svlandeg

faithleysath avatar May 26 '25 15:05 faithleysath

This would be a great addition!

Dude29 avatar May 28 '25 16:05 Dude29

bump? @svlandeg @tiangolo

faithleysath avatar May 31 '25 06:05 faithleysath

Added docs!

amanmibra avatar Jun 08 '25 04:06 amanmibra