✨ Add `PydanticJSONB` TypeDecorator for Automatic Pydantic Model Serialization in SQLModel
Description
This PR introduces a PydanticJSONB to SQLModel, enabling seamless serialization and deserialization of Pydantic models in JSONB columns. This removes the need for manual conversion, allowing SQLModel instances to work directly with Pydantic objects.
Why? Storing Pydantic models in JSONB columns has been a recurring challenge. This PR solves that by automating conversion between Pydantic models and JSON fields.
How?
- On Save: Converts Pydantic models to dictionaries before storing in JSONB.
- On Load: Automatically instantiates the Pydantic model from the stored JSON.
Benefits ✅ Eliminates manual conversion – No need to wrap dict(**org.config) manually. ✅ Ensures structured storage – Enforces Pydantic validation automatically. ✅ Improves dev experience – Seamless interaction with JSONB fields in SQLModel.
Example Usage of PydanticJSONB in SQLModel
With this PR, you can now store and retrieve Pydantic models in JSONB fields effortlessly.
Define a Pydantic Model
from pydantic import BaseModel
from sqlmodel import SQLModel, Field
from sqlalchemy import Column
from uuid import UUID
from typing import Optional
from deeptrust.database.types import PydanticJSONB # Assuming this PR introduces it
# Define a Pydantic model for structured storage
class OrgConfig(BaseModel):
custom_join_message: Optional[str] = None
custom_bot_name: Optional[str] = None
retain_period: int = 86400
# Define a SQLModel with a JSONB column that stores the Pydantic model
class Org(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
propel_id: UUID
name: str
config: OrgConfig = Field(
sa_column=Column(PydanticJSONB(OrgConfig)), # Auto-serializes/deserializes
default_factory=OrgConfig
)
Create & Store Data
from sqlmodel import Session, create_engine
engine = create_engine("postgresql://user:password@localhost/db")
# Insert an Org with a Pydantic config
with Session(engine) as session:
org = Org(
propel_id=UUID("123e4567-e89b-12d3-a456-426614174000"),
name="DeepTrust",
config=OrgConfig(custom_bot_name="TrustBot")
)
session.add(org)
session.commit()
Retrieve & Use Data
with Session(engine) as session:
org = session.query(Org).first()
print(org.config.custom_bot_name) # "TrustBot"
print(type(org.config)) # <class '__main__.OrgConfig'>
Result: ✅ No need for OrgConfig(**org.config) – it's already a OrgConfig instance! ✅ Automatic conversion between JSONB and Pydantic models.
This simplifies handling structured configurations in SQLModel, making JSONB storage seamless and ergonomic. 🚀
Related Issues & Discussions
SQLModel Issue #63 – Nested Pydantic models in JSON fields. Stack Overflow: Writing Pydantic objects into SQLAlchemy JSON columns GitHub Gist Example – Community implementation of Pydantic models in JSON columns. This PR makes SQLModel more ergonomic for JSONB storage while maintaining compatibility with SQLAlchemy and Pydantic. 🚀
TODO
Per contribution rules: https://sqlmodel.tiangolo.com/help/#create-a-pull-request
This PR still needs:
- Unit Tests
- Documentation
Before I do that, I would love to hear your thoughts @tiangolo!
I like this approach. There seem to me at the moment 2 things that would still need to be added.
- Support for e.g. some list of models?
- Currently you implemented model_dump, if we would use model_dump with mode="json", UUID, datetimes etc would correctly serialize.
- parse_obj is deprecated. Would we want to use model_validate? Or construct the model without another round of validation?
What about the following?
from typing import Any, Type, TypeVar, get_args
from pydantic import BaseModel
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import JSONB # for Postgres JSONB
BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
class PydanticJSONB(types.TypeDecorator): # type: ignore
"""Custom type to automatically handle Pydantic model serialization."""
impl = JSONB # use JSONB type in Postgres (fallback to JSON for others)
cache_ok = True # allow SQLAlchemy to cache results
def __init__(self, model_class: Type[BaseModel] | Type[list[BaseModelType]], *args, **kwargs):
super().__init__(*args, **kwargs)
self.model_class = model_class # Pydantic model class to use
def process_bind_param(self, value: Any, dialect) -> list[dict] | dict | None: # noqa: ANN401, ARG002, ANN001
if isinstance(value, BaseModel):
return value.model_dump(mode="json")
if isinstance(value, list):
return [m.model_dump(mode="json") for m in value]
return value
def process_result_value(self, value: Any, dialect) -> list[BaseModel] | BaseModel | None: # noqa: ANN401, ARG002, ANN001
# Called when loading from DB: convert dict to Pydantic model instance
if isinstance(value, dict):
return self.model_class.model_validate(value) # type: ignore
if isinstance(value, list):
return [get_args(self.model_class)[0].model_validate(v) for v in value]
return None
+1 from @DaanRademaker 's comment. This PydanticJSONB implementation needs to support List[MODEL], Dict[str, MODEL] and more, as well as serializing and deserializing them, and handling correctly the flag_modified when doing a direct assignment or mutations !
Dict[str, MODEL]
@Seluj78 Should we support this? What is a use case where someone would define the type as a dictionary of Pydantic models instead of making the dictionary itself a Pydantic model?
I hesitate because when process_result_value returns a dict, additional logic is needed to distinguish between a basic dictionary and a structured collection of Pydantic models. This could introduce ambiguity or unexpected behavior in model validation and serialization.
Actually, maybe the key signifier can come from the self.model_class. Let me try something and push my changes.
I see what you mean. I was trying to avoid an extra step by just having my_column: Dict[key, PydanticModelValue] = ...
The main problem you'd need to tacle anyway, no matter if you choose to support this or not is the mutability and assignment detection. It was a nightmare to try and get working and I failed on my end
I see what you mean. I was trying to avoid an extra step by just having
my_column: Dict[key, PydanticModelValue] = ...The main problem you'd need to tacle anyway, no matter if you choose to support this or not is the mutability and assignment detection. It was a nightmare to try and get working and I failed on my end
No, it actually makes sense, sorry, I had to think about for a minute.
Take a look at my most recent commit. I manage it by, just like with list, having the user explicitly name the key-value pair b/n the str and BaseModel at initialization.
If you want, throw this at the bottom and run python sqltypes.py:
if __name__ == "__main__":
from typing import Dict, List
from pydantic import BaseModel
from sqlalchemy import Column
from sqlmodel import Field, SQLModel # Import Field
from sqlmodel.sql.sqltypes import PydanticJSONB
# Define some Pydantic models
class Address(BaseModel):
street: str
city: str
class User(BaseModel):
name: str
age: int
# 1. Single Model Example
class PersonTable(SQLModel, table=True):
id: int | None = Field(default=None, primary_key=True)
# Use Field instead of Column, and wrap Address with PydanticJSONB
address: Address = Field(sa_column=Column(PydanticJSONB(Address)))
# 2. List of Models Example
class TeamTable(SQLModel, table=True):
id: int | None = Field(default=None, primary_key=True)
members: List[User] = Field(sa_column=Column(PydanticJSONB(List[User])))
# 3. Dictionary of Models Example
class CompanyTable(SQLModel, table=True):
id: int | None = Field(default=None, primary_key=True)
employees: Dict[str, User] = Field(
sa_column=Column(PydanticJSONB(Dict[str, User]))
)
# Test instances
person = PersonTable(address=Address(street="123 Main St", city="Boston"))
print("Person:")
print(person)
team = TeamTable(members=[User(name="Alice", age=30), User(name="Bob", age=25)])
print("\nTeam:")
print(team)
company = CompanyTable(
employees={"alice": User(name="Alice", age=30), "bob": User(name="Bob", age=25)}
)
print("\nCompany:")
print(company)
Nice improvements!
I think there is 1 more interesting usecase.
Let's say you would use the insert statement from the postgres dialect, the process bind param will be called with a dictionary that might not be jsonable.
It would probably be useful to convert any dictionary to jsonable using pydantic to_jsonable function. I added my usecase in the code below.
from datetime import date
from decimal import Decimal
from typing import Any, Dict, List, Type, TypeVar, get_args
from uuid import UUID
from pydantic import BaseModel
from pydantic_core import to_jsonable_python
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import (
JSONB, # for Postgres JSONB
insert,
)
from sqlmodel import Field, SQLModel
BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
# Add to_jsonable_python
class PydanticJSONB(types.TypeDecorator): # type: ignore
"""Custom type to automatically handle Pydantic model serialization."""
impl = JSONB # use JSONB type in Postgres (fallback to JSON for others)
cache_ok = True # allow SQLAlchemy to cache results
def __init__(
self,
model_class: Type[BaseModelType] | Type[list[BaseModelType]] | Type[Dict[str, BaseModelType]],
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.model_class = model_class # Pydantic model class to use
def process_bind_param(self, value: Any, dialect) -> dict | list[dict] | None: # noqa: ANN401, ARG002, ANN001
if value is None:
return None
if isinstance(value, BaseModel):
return value.model_dump(mode="json")
if isinstance(value, list):
return [m.model_dump(mode="json") if isinstance(m, BaseModel) else to_jsonable_python(m) for m in value]
if isinstance(value, dict):
return {
k: v.model_dump(mode="json") if isinstance(v, BaseModel) else to_jsonable_python(v)
for k, v in value.items()
}
return to_jsonable_python(value)
def process_result_value(
self, value: Any, dialect
) -> BaseModelType | List[BaseModelType] | Dict[str, BaseModelType] | None:
if value is None:
return None
if isinstance(value, dict):
# If model_class is a Dict type hint, handle key-value pairs
if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is dict:
model_class = get_args(self.model_class)[1] # Get the value type (the model)
return {k: model_class.model_validate(v) for k, v in value.items()}
# Regular case: the whole dict represents a single model
return self.model_class.model_validate(value) # type: ignore
if isinstance(value, list):
# If model_class is a List type hint
if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is list:
model_class = get_args(self.model_class)[0]
return [model_class.model_validate(v) for v in value]
# Fallback case (though this shouldn't happen given our __init__ types)
return [self.model_class.model_validate(v) for v in value] # type: ignore
return value
class SomeNestedModel(BaseModel):
some_decimal: Decimal
class SomeModel(SQLModel, table=True):
some_uuid: UUID = Field(primary_key=True)
some_date: date = Field(primary_key=True)
some_nested: SomeNestedModel = Field(sa_column=Field(PydanticJSONB(SomeNestedModel)))
item_1 = SomeModel(
some_uuid=UUID("123e4567-e89b-12d3-a456-426614174000"),
some_date=date(2022, 1, 1),
some_nested=SomeNestedModel(some_decimal=Decimal("1.23")), #
)
# model_dump(mode="json") cannot be called because else fields like some_date and some_uuid are not correct type
# therefore added to_jsonable_python in process_bind_param
not_yet_jsonable_dict = item_1.model_dump()
stmt = insert(SomeModel).values(not_yet_jsonable_dict)
# statement cannot executed with jsonable dict because it loses datetime, uuid types etc
jsonable_dict = item_1.model_dump(mode="json")
stmt_2 = insert(SomeModel).values(jsonable_dict)
# model_dump(mode="json") cannot be called because else fields like some_date and some_uuid are not correct type # therefore added to_jsonable_python in process_bind_param not_yet_jsonable_dict = item_1.model_dump() stmt = insert(SomeModel).values(not_yet_jsonable_dict) # statement cannot executed with jsonable dict because it loses datetime, uuid types etc jsonable_dict = item_1.model_dump(mode="json") stmt_2 = insert(SomeModel).values(jsonable_dict)
@DaanRademaker
I am a little confused. How is PydanticJSONB going to solve serialization for non-PydanticJSONB columns?
Also, I tried your example (make sure your update the sa_column in some_nested attribute to hold Column and not Field), and it worked just fine.
from datetime import date
from decimal import Decimal
from pydantic import BaseModel
from uuid import UUID
from sqlmodel import Column, Field, SQLModel
class SomeNestedModel(BaseModel):
some_decimal: Decimal
class SomeModel(SQLModel, table=True):
some_uuid: UUID = Field(primary_key=True)
some_date: date = Field(primary_key=True)
some_nested: SomeNestedModel = Field(
sa_column=Column(PydanticJSONB(SomeNestedModel))
)
item_1 = SomeModel(
some_uuid=UUID("123e4567-e89b-12d3-a456-426614174000"),
some_date=date(2022, 1, 1),
some_nested=SomeNestedModel(some_decimal=Decimal("1.23")), #
)
print(item_1) # some_uuid=UUID('123e4567-e89b-12d3-a456-426614174000') some_date=datetime.date(2022, 1, 1) some_nested=SomeNestedModel(some_decimal=Decimal('1.23'))
Hi all,
Thanks for the contribution and it's great to see this level of engagement! Just as a maintenance note, I'll put this in draft while the CI is failing. Feel free to mark as "Ready for review" when it's green!
Hi all,
Thanks for the contribution and it's great to see this level of engagement! Just as a maintenance note, I'll put this in draft while the CI is failing. Feel free to mark as "Ready for review" when it's green!
🫡
I took some time to create a fully reproducible example to show the error I am getting. The issue seems to be with difference between using a sync adapter (psycopg2) or async adapter (asyncpg). Asyncpg cannot deal with already jsonable python dict before hitting the process_bind_param function.
It seems asyncpg does stricter type handling during parameter substitution and does not convert to the correct type as psycopg2 does.
By adding the to_jsonable_python in the process_bind_param function we are able to pass an not yet jsonable dict. Which resolves this issue with asyncpg.
async def test_reproducable_example():
from datetime import date, datetime
from decimal import Decimal
from typing import Any, Dict, List, Type, TypeVar, get_args
from uuid import UUID
from pydantic import BaseModel
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import (
JSONB, # for Postgres JSONB
insert,
)
from sqlmodel import Column, Field, Session, SQLModel, create_engine, select
from sqlmodel.ext.asyncio.session import AsyncSession
from sqlalchemy.ext.asyncio import create_async_engine
BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
# fill in your database connection details here
postgres_url = "postgresql+psycopg2://user:password@localhost:5432/dbname"
postgres_url_async = "postgresql+asyncpg://user:password@localhost:5432/dbname"
async_engine = create_async_engine(postgres_url_async)
sync_engine = create_engine(postgres_url)
class PydanticJSONB(types.TypeDecorator): # type: ignore
"""Custom type to automatically handle Pydantic model serialization."""
impl = JSONB # use JSONB type in Postgres (fallback to JSON for others)
cache_ok = True # allow SQLAlchemy to cache results
def __init__(
self,
model_class: Type[BaseModelType] | Type[list[BaseModelType]] | Type[Dict[str, BaseModelType]],
*args,
**kwargs,
):
super().__init__(*args, **kwargs)
self.model_class = model_class # Pydantic model class to use
def process_bind_param(self, value: Any, dialect) -> dict | list[dict] | None: # noqa: ANN401, ARG002
if value is None:
return None
if isinstance(value, BaseModel):
return value.model_dump(mode="json")
if isinstance(value, list):
return [m.model_dump(mode="json") if isinstance(m, BaseModel) else m for m in value]
if isinstance(value, dict):
return {k: v.model_dump(mode="json") if isinstance(v, BaseModel) else v for k, v in value.items()}
return value
def process_result_value(
self, value: Any, dialect
) -> BaseModelType | List[BaseModelType] | Dict[str, BaseModelType] | None:
if value is None:
return None
if isinstance(value, dict):
# If model_class is a Dict type hint, handle key-value pairs
if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is dict:
model_class = get_args(self.model_class)[1] # Get the value type (the model)
return {k: model_class.model_validate(v) for k, v in value.items()}
# Regular case: the whole dict represents a single model
return self.model_class.model_validate(value) # type: ignore
if isinstance(value, list):
# If model_class is a List type hint
if hasattr(self.model_class, "__origin__") and self.model_class.__origin__ is list:
model_class = get_args(self.model_class)[0]
return [model_class.model_validate(v) for v in value]
# Fallback case (though this shouldn't happen given our __init__ types)
return [self.model_class.model_validate(v) for v in value] # type: ignore
return value
class SomeNestedModel(BaseModel):
some_decimal: Decimal
class SomeModel(SQLModel, table=True):
some_uuid: UUID = Field(primary_key=True)
some_date: date = Field(primary_key=True)
some_nested: SomeNestedModel = Field(sa_column=Column(PydanticJSONB(SomeNestedModel)))
SQLModel.metadata.create_all(sync_engine)
item_1 = SomeModel(
some_uuid=UUID("123e4567-e89b-12d3-a456-426614174000"),
some_date=date(2022, 1, 1),
some_nested=SomeNestedModel(some_decimal=Decimal("1.23")),
)
items_mode_json = [item_1.model_dump(mode="json")]
items_non_json_mode = [item_1.model_dump()]
stmt_json_mode = insert(SomeModel).values(items_mode_json)
stmt_non_json_mode = insert(SomeModel).values(items_non_json_mode)
# Psycopg2 engine causing no problems!
with Session(sync_engine) as session:
session.execute(stmt_json_mode)
session.commit()
# This fails with statement error, the PydanticJSONB type decorator process_bind_param is called
# With a dict that does not have json serializable values (Decimal is not serializable)
try:
with Session(sync_engine) as session:
session.execute(stmt_non_json_mode)
session.commit()
except Exception as e:
print(e)
try:
# This fails with DBAPIError todordinal of the date type, asyncpg engine cannot deal with
# Date values already being converted to string
async with AsyncSession(async_engine) as session:
await session.execute(stmt_json_mode)
await session.commit()
except Exception as e:
print(e)
try:
# This fails with StatementError the PydanticJSONB type decorator process_bind_param is called
# With a dict that does not have json serializable values (Decimal is not serializable)
async with AsyncSession(async_engine) as session:
await session.execute(stmt_non_json_mode)
await session.commit()
except Exception as e:
print(e)
@DaanRademaker Thanks for sending that example! Helped me understand the blocker you were presenting
Made the updates to include to_jsonable_python.
Only issue now is the linter is yelling at me due to the function returning Any, so I added an ignore.
bump? @svlandeg
Any advice in how will be used in a where clause? I think that the SQLModel is not Indexable to use in the JSONB query structure Model.JSONB_Column['JSON_PROPERTY']
Im not sure if this is right, but it works for me
from functools import partial
from typing import Any, Dict, List, Optional, Type, TypeVar, Union, get_args, get_origin
from pydantic import BaseModel
from pydantic_core import to_jsonable_python
from sqlalchemy import types
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.sql.operators import ColumnOperators
from sqlalchemy.sql.type_api import _ComparatorFactory
BaseModelType = TypeVar("BaseModelType", bound=BaseModel)
JSONValue = Union[Dict[str, Any], List[Any], str, int, float, bool, None]
class PydanticJSONBComparator(JSONB.Comparator):
def __init__(self, *args, **kwargs):
self.model_class = kwargs.pop("model_class", None)
return super().__init__(*args, **kwargs)
def __getattr__(self, name: Any) -> ColumnOperators:
if name in self.model_class.model_fields:
return self.__getitem__(name)
raise AttributeError(
f"'{self.__class__.__name__}' object has no attribute '{name}'"
f" or '{name}' is not a valid field in {self.model_class.__name__}"
)
# Define a type alias for JSON-serializable values
class PydanticJSONB(types.TypeDecorator): # type: ignore
"""Custom type to automatically handle Pydantic model serialization with comparators."""
impl = JSONB # use JSONB type in Postgres (fallback to JSON for others)
cache_ok = True # allow SQLAlchemy to cache results
def __init__(
self,
model_class: Union[
Type[BaseModelType],
Type[List[BaseModelType]],
Type[Dict[str, BaseModelType]],
],
*args: Any,
**kwargs: Any,
):
super().__init__(*args, **kwargs)
self.model_class = model_class # Pydantic model class to use
@property
def comparator_factory(self) -> _ComparatorFactory[Any]: # type: ignore # mypy properties bug
"""Override the comparator factory to use our custom comparator."""
return partial(PydanticJSONBComparator, model_class=self.model_class)
def process_bind_param(self, value: Any, dialect: Any) -> JSONValue: # noqa: ANN401, ARG002, ANN001
if value is None:
return None
if isinstance(value, BaseModel):
return value.model_dump(mode="json")
if isinstance(value, list):
return [
m.model_dump(mode="json")
if isinstance(m, BaseModel)
else to_jsonable_python(m)
for m in value
]
if isinstance(value, dict):
return {
k: v.model_dump(mode="json")
if isinstance(v, BaseModel)
else to_jsonable_python(v)
for k, v in value.items()
}
# We know to_jsonable_python returns a JSON-serializable value, but mypy sees it as Any
return to_jsonable_python(value) # type: ignore[no-any-return]
def process_result_value(
self, value: Any, dialect: Any
) -> Optional[Union[BaseModelType, List[BaseModelType], Dict[str, BaseModelType]]]: # noqa: ANN401, ARG002, ANN001
if value is None:
return None
if isinstance(value, dict):
# If model_class is a Dict type hint, handle key-value pairs
origin = get_origin(self.model_class)
if origin is dict:
model_class = get_args(self.model_class)[
1
] # Get the value type (the model)
return {k: model_class.model_validate(v) for k, v in value.items()}
# Regular case: the whole dict represents a single model
return self.model_class.model_validate(value) # type: ignore
if isinstance(value, list):
# If model_class is a List type hint
origin = get_origin(self.model_class)
if origin is list:
model_class = get_args(self.model_class)[0]
return [model_class.model_validate(v) for v in value]
# Fallback case (though this shouldn't happen given our __init__ types)
return [self.model_class.model_validate(v) for v in value] # type: ignore
raise TypeError(
f"Unsupported type for PydanticJSONB from database: {type(value)}. Expected a dictionary or list."
)
Im not sure if this is right, but it works for me
from functools import partial from typing import Any, Dict, List, Optional, Type, TypeVar, Union, get_args, get_origin from pydantic import BaseModel from pydantic_core import to_jsonable_python from sqlalchemy import types from sqlalchemy.dialects.postgresql import JSONB from sqlalchemy.sql.operators import ColumnOperators from sqlalchemy.sql.type_api import _ComparatorFactory BaseModelType = TypeVar("BaseModelType", bound=BaseModel) JSONValue = Union[Dict[str, Any], List[Any], str, int, float, bool, None] class PydanticJSONBComparator(JSONB.Comparator): def __init__(self, *args, **kwargs): self.model_class = kwargs.pop("model_class", None) return super().__init__(*args, **kwargs) def __getattr__(self, name: Any) -> ColumnOperators: if name in self.model_class.model_fields: return self.__getitem__(name) raise AttributeError( f"'{self.__class__.__name__}' object has no attribute '{name}'" f" or '{name}' is not a valid field in {self.model_class.__name__}" )
Oh nice this is pretty cool!
@svlandeg why is this marked as a draft again?
@svlandeg why is this marked as a draft again?
Because the tests are red again?
@svlandeg why is this marked as a draft again?
Because the tests are red again?
Oh strange. I didn't change anything and it was all green.
Tests were updated?
@svlandeg fyi those failures have nothing to do with my changes
Your main branch is failing, please check your house https://github.com/fastapi/sqlmodel/actions/runs/14167821533/job/39684751783
@amanmibra: we're looking into it here - I will update your PR once the main tests are green again 🙏
@svlandeg checking the status of the CI fix
Tests look good to go! @svlandeg
I want this feature and the code looks good to me. But the documentation is currently lacking (both on the developer side and on the user's guide side). I would like to see a docstring here that contains information on how the parameters are used.
As for the users guide, I think a new section under 'Advanced User Guide' should be created. Some text like (note this might be wrong as I may have misunderstood the API, please check!):
Pydantic Model Serialization (JSONB)
In some cases, you may wish to have a nested model inside your ORM, rather than using a relationship. This would be appropriate when you have, for instance, custom metadata about a row that is present in some cases and not in others (though note that metadata is a reserved keyword and can't be used as a field name!).
Thankfully, many databases (e.g. PostgreSQL through JSONB) explicitly include support for JSON types, and pydantic was built with JSON serialization in mind!
These concepts can be coupled together using the PydanticJSONB type.
from sqlmodel.sql.sqltypes import PydanticJSONB
from sqlmodel import SQLModel, Field
from pydantic import BaseModel
from typing import Any
class ExtraData(BaseModel):
x: float
y: str
z: dict[str, Any]
class MyTable(SQLModel, table=True):
id: int = Field(primary_key=True)
data: ExtraData = Field(sa_type=PydanticJSONB(ExtraData))
To use this new table model, you simply need to pass data as an additional argument, and it must be of the ExtraData type:
row = MyTable(
data=ExtraData(
x=9.32,
y="extra_data",
z={
"flag": True
}
)
)
Crucially, for database backends that support JSON fields, this will be converted to database-native JSON, and as such can be filtered and searched. At this time, SQLModel does not provide explicit support for searching these nested fields.
I want this feature and the code looks good to me. But the documentation is currently lacking (both on the developer side and on the user's guide side). I would like to see a docstring here that contains information on how the parameters are used.
I’m glad to hear this feature is valuable to others 🙂 I'm a new contributor, so I’ll follow the contributing guide and aim to add proper docs soon, likely next weekend, to be honest.
There's also a follow-up improvement I’d like to propose in a separate PR once I’ve had time to build and test it.
Right now, like with plain dict in SQLAlchemy, updating attributes inside the nested model doesn’t trigger a database update. For example:
row = select(...) # some MyTable row
row.data.x = 1
db.add(row) # no effect, change isn’t detected
The only workaround for now is reassigning a new instance:
updated = ExtraData(**row.data.model_dump())
updated.x = 1
row.data = updated
db.add(row)
Using MutableDict could solve this by enabling change tracking for nested fields.
MutableDict object implements a dictionary that will emit change events to the underlying mapping when the contents of the dictionary are altered, including when values are added or removed.
Been waiting for this feature! PR looks ready after extensive testing. When can we expect a merge? @svlandeg
This would be a great addition!
bump? @svlandeg @tiangolo
Added docs!