ape icon indicating copy to clipboard operation
ape copied to clipboard

SPIKE: Generate SQL models from fields in pydantic models

Open johnson2427 opened this issue 2 years ago • 14 comments

Overview

Currently, we manually define the SQL models for the caching system. Look into automatically generating the table columns from the fields of pydantic models that are associated with these caching tables.

Specification

This is the general idea, but obviously not the implementation

class ContractEvents(SQLModel):
    id: Optional[int]
    event: str
    ...

class ContractEventsCache(ContractEvents, table=true):
    some_fk_field: some_data_type = Field(some_fk, ...)

Dependencies

sqlmodel -> https://sqlmodel.tiangolo.com/

johnson2427 avatar Jul 14 '22 14:07 johnson2427

there is this library but i found it insufficient for my use cases https://sqlmodel.tiangolo.com

for example, it's probably a good idea for events to have a composite primary key of (block_number, log_index)

banteg avatar Jul 14 '22 20:07 banteg

@banteg yeah, I absolutely love Sebastian Ramirez's work. I'm not sure we want to do this yet. SQL is rigid by design, so, even though it's a little cumbersome to have to manage 2 sides of these models (both our pydantic models and our SQL models) I would be doing so when developing a RestAPI anyway. So I'm not sure this Spike is worth our time yet. I'll do some diving on this issue tonight.

If we do go down this route, maybe we should integrate it into our current pydantic models and then just use those models and inherit with SQLModel, and then our models would act as both a pydantic object and an SQLModel, and we would be able to avoid the overhead of managing both? @fubuloubu let me know if what I'm saying is written in english. It would require a giant refactor of everything that @NotPeopling2day did.

johnson2427 avatar Jul 14 '22 21:07 johnson2427

Essentially, what I am suggesting would look something like this:

class ContractEvents(SQLModel):
    id: Optional[int]
    event: str
    ...
class ContractEventsCache(ContractEvents, table=true):
    some_fk_field: some_data_type = Field(some_fk, ...)

I'll dig in and use this tonight. Maybe I can find some nice features and report back in the morning. But we would be inheriting BaseModel with that SQLModel, just SQLModel would give us the best of both worlds.

johnson2427 avatar Jul 14 '22 21:07 johnson2427

Straight from the docs: "That means you get all of Pydantic's features, including automatic data validation, serialization, and documentation. You can use SQLModel in the same way you can use Pydantic.

You can even create SQLModel models that do not represent SQL tables. In that case, they would be the same as Pydantic models.

This is useful, in particular, because now you can create a SQL database model that inherits from another non-SQL model. You can use that to reduce code duplication a lot. It will also make your code more consistent, improve editor support, etc."

johnson2427 avatar Jul 14 '22 21:07 johnson2427

https://github.com/ApeWorX/ape/issues/878#issuecomment-1184866345

This is your solution if, and only if, we refactor from Pydantic to SQLModel throughout all of Ape

johnson2427 avatar Jul 14 '22 21:07 johnson2427

#878 (comment)

This is your solution if, and only if, we refactor from Pydantic to SQLModel throughout all of Ape

I think part of this Spike would be to investigate refactoring the core "row types" (e.g. ReceiptAPI for transactions, BlockAPI for blocks, and EventLog for events) of the cache DB to be using simpler types at the base layer, so it becomes possible to convert them into SQL fields without translating them.

That would indeed be a lot of work, but something I think would be worthwhile for moving forward with a slimmer design that scales a bit better as we can serialize to and from the Ecosystem-specific formats from the generalized table formats of the core API objects. Perhaps this will come next sprint though.

Will have to start thinking about how migrations are performed for the cache db, perhaps have it store the version of ape that the cache db is currently using and have it raise a warning and skip working if there is an incompatible version and a migration is needed to allow the cache db to function again

fubuloubu avatar Jul 14 '22 22:07 fubuloubu

alembic would probably be the move here @fubuloubu

johnson2427 avatar Jul 14 '22 23:07 johnson2427

Yeah, for migrations of a big cache db it sure would

But honestly, we have the ability to purge the db, I think for now if you could add a method of storing versioning info into the cache db for now, and checking that it's up to date, that would be enough to stop querying the database when it's made an incompatible update, and the solution is to ask the user to purge their cache db (and later on we can add migration as a solution instead)

fubuloubu avatar Jul 14 '22 23:07 fubuloubu

Sorry, just for clarification, what do you mean by "simpler types?"

johnson2427 avatar Jul 15 '22 01:07 johnson2427

Sorry, just for clarification, what do you mean by "simpler types?"

Ones that work in the database schema

fubuloubu avatar Jul 15 '22 08:07 fubuloubu

@fubuloubu SQLAlchemy requires SQLAlchemy types for making a table. They are not python types. So we need to be able to abstract the python datatypes into sqlalchemy datatypes. Like an alias.

johnson2427 avatar Jul 15 '22 14:07 johnson2427

@fubuloubu, @banteg I am using sqlmodel in a RestAPI service that I am building out. And while it has some pretty interesting features, it is not an end all be all solution. I think it would be a mistake to integrate this into Ape. This spike will remain open, I am still attempting to generate SQL fields using the pydantic models, just not with sqlmodel.

johnson2427 avatar Jul 27 '22 04:07 johnson2427

i've built a toy plugin here, feel free to borrow anything from it https://github.com/banteg/ape-events/blob/main/ape_events/init.py

i agree that sqlmodel is probably not mature enough

banteg avatar Jul 27 '22 23:07 banteg

@banteg don't you worry! I already took your plugin and started applying it to the cache, but I have to create the query on the backside first. I'm about a day out from having this ready

johnson2427 avatar Jul 27 '22 23:07 johnson2427