core
core copied to clipboard
[Huge Feature] Deprecate `metadata.json` in favour of SQL
I'd like to move the current metadata.json to a proper SQL db via SQLModel.
Advantages:
- flexibility: custom endpoints and plugins can rely on a proper db, with an easy to use ORM
- scalability: default Sqlite, easy to plug in a mysql/postgres container
Few months ago I tried out directly with a graph DB, but the tech is not mature enough to be both embeddable and ORM ready.
Do you think it is possible to support both the implementation and continue supporting also metatada.json, abstracting the model? Something like you are already doing with CacheManager.
metadata.json is very handy when you use the cat as a simple service inside a complex architecture, where user management, permissions, and other complex aspects are managed externally.
Do you think it is possible to support both the implementation and continue supporting also metatada.json, abstracting the model? Something like you are already doing with
CacheManager.
metadata.jsonis very handy when you use the cat as a simple service inside a complex architecture, where user management, permissions, and other complex aspects are managed externally.
Hey @lucapiccinelli thanks for suggesting.
Actually I am experimenting to have everything inside Qdrant, since now with payload indexing we can use it as a pseudo-mongodb. One DB, and that's it.
I want to get rid of metadata.json, but I see your point for settings it may be useful; maybe we can have a setup script alà django (the excellent settings.py) to have an easy configurator.
That could fix the issue what do you think. Having adapter pattern on the db looks way too much work 😔
I understand your point. I would like to contribute to abstracting the metadata.json but unfortunately I can't commit on that.
I don't have experience with Django, so I can't give you my insights. Anyway, thank you for your response. If we won't be able to contribute, then we will adapt 😉
Hi @pieroit
I've been working on a solution based on database adapters, and I'd love to get your thoughts on it.
Core Idea
The goal is to define an AbstractDatabase interface and build different implementations for various databases.
How It Works
-
Database Adapters
- Each database (TinyDB, SQLModel, etc.) implements
AbstractDatabaseand handles the actual operations. - Currently, TinyDB is fully working, and I'm experimenting with SQLModel (though it's trickier due to migrations, strict typing, etc.).
- Each database (TinyDB, SQLModel, etc.) implements
-
Tables for Better Usability
- Tables can be identified by either:
- A
modelclass (like in SQLModel) - Or a
table_namestring (like in TinyDB).
- A
BaseTableprovides basic CRUD operations for a table.DefaultTable(extendsBaseTable) makes it even easier (e.g.,create()returns the table instance).
- Tables can be identified by either:
Refactored CRUD Operations
I've updated all functions in crud.py to work with new AbstractDatabase and DefaultTable system.
The magic happens with TinyDB automatic settings table creation. Makes all wonderful.
I you want take a look directly: https://github.com/davidebizzocchi/cheschire-cat-core/blob/7cb7cd8a0908402cfbe675e7ca3fa6ad29a95e60/core/cat/db/crud.py
Key Benefits
✅ Flexibility – Use any database by extending AbstractDatabase.
✅ Multiple Databases at Once – E.g., TinyDB for settings + SQLModel for users.
✅ Backward Compatible – Keeps metadata.json support but allows switching.
Next Steps
- SQLModel Adapter – Needs more work (migrations, type strictness, etc.).
- Configuration – Let users choose the default DB via
.envor plugins.
Code
https://github.com/davidebizzocchi/cheschire-cat-core/tree/develop/core/cat/db
Your Thoughts?
- Does this approach make sense for our needs?
- Doubts or suggestions?
Let me know! 🚀
Updates
I (finally) implemented the SQLDatabase "adapter" using SQLAlchemy.
SQLModel
After trying it, I think it's unnecessary and I don't like it. The only thing this library does is using a Pydantic class to define tables. This turned out to be more of a problem than a useful feature.
It does two other interesting things, so I added only them in cat.db.utils.sqlmodel.
NOTE: They use MIT license that isn't a problem (correct me if I wrong): https://github.com/fastapi/sqlmodel?tab=MIT-1-ov-file (I include credits in cat.db.utils.sqlmodel.__init__.py)
Features
I'm quite proud of this part of the implementation😁.
Created a utils folder for better organization
-
cast_result (in cast.py)
Decorator that lets you cast the method return value using a class attribute: natively used inDefaultTable. -
build_table (in tables.py)
This is a very powerful function that lets you create a valid SQLAlchemy Table from a Pydantic class.
Here I have problems with the SQLModel library, infact they use a Pydantic class (for db tables) with a custom FieldInfo class, this requires more code than the actual benefits it provides.It's natively implemented in
DefaultTable. If you pass the pydantic_model param in__init__(and not the model), you can use just a Pydantic class to create both the table and cast the return object of methods (using @cast_result).
ENVs
Added a new variable: CCAT_DB_TYPE.
Possible values are: "tiny" (default) or "sql".
The SQLDatabase uses CCAT_METADATA_FILE as the path to create the database.
Maybe an env variable named "CCAT_DB_FILE_PATH" or something similar would be more appropriate.
Core Idea
The goal is to define an
AbstractDatabaseinterface and build different implementations for various databases.How It Works
Database Adapters
- Each database (TinyDB, SQLModel, etc.) implements
AbstractDatabaseand handles the actual operations.- Currently, TinyDB is fully working, and I'm experimenting with SQLModel (though it's trickier due to migrations, strict typing, etc.).
Made a similar try, but ended up working on using Qdrant directly, because now you can have payload indexes. Still in plugins a simple sqlite and a few endpoints is awesome for small CRUDS and service data.
Tables for Better Usability
Tables can be identified by either:
- A
modelclass (like in SQLModel)- Or a
table_namestring (like in TinyDB).
BaseTableprovides basic CRUD operations for a table.
DefaultTable(extendsBaseTable) makes it even easier (e.g.,create()returns the table instance).
I'm sorry (also because this started from my own issue here) but this assumes tabular data, and forces us to maintain an adapter system also for settings. Similar proposal was made for the vector DB under the adapter pattern ( see #1040 )
After weeks of thinking about this I think the best way is to use just one db, Qdrant, and if people want to use other DBs for memory and in their endpoints, they can do so from a plugin
The risk here is reinventing Django, you know what I'm talking about
Refactored CRUD Operations
I've updated all functions in
crud.pyto work with newAbstractDatabaseandDefaultTablesystem. The magic happens with TinyDB automaticsettingstable creation. Makes all wonderful.I you want take a look directly: https://github.com/davidebizzocchi/cheschire-cat-core/blob/7cb7cd8a0908402cfbe675e7ca3fa6ad29a95e60/core/cat/db/crud.py
A great piece of code, and a great read kudos. I'm not convinced this is the way sorry for my step back
Updates
I (finally) implemented the
SQLDatabase"adapter" using SQLAlchemy.SQLModel
After trying it, I think it's unnecessary and I don't like it. The only thing this library does is using a Pydantic class to define tables. This turned out to be more of a problem than a useful feature.
Agree, all endpoints would anyway need a classic variation (create, update) of pydantic models on top of it. SQLModel does not give over SQLAlchemy the same advantage fastAPI gives over starlette
Features
I'm quite proud of this part of the implementation😁.
You got my appreciation as a coder and contributor, hope you take it right that this feat may not end up into core
Let me tag a few key contribs here that touched those parts of code so they can say a word: @Pingdred @lucagobbi @sambarza @lucapirrone @kodaline @AlboCode @valentimarco
I would go for full Qdrant using payload indexes and a noSQL approach:
- one collection for settings
- one collection for users (which should not require too much effort, as auth and user registry can be easily delegated outside with an AuthHandler - and it works, already tried with Keycloak, Supabase Auth and Django)
One DB, and since there are custom endpoints a plugin can nevertheless decide where to store and retrieve data.
Also please chek out this issue on the cache, in which I propose to have a settings.py file to configure system wide settings instead of factory and endpoints.
I'm sorry (also because this started from my own issue here) but this assumes tabular data, and forces us to maintain an adapter system also for settings. Similar proposal was made for the vector DB under the adapter pattern ( see #1040 )
After weeks of thinking about this I think the best way is to use just one db, Qdrant, and if people want to use other DBs for memory and in their endpoints, they can do so from a plugin
The risk here is reinventing Django, you know what I'm talking about
Ok no problem for me. Maybe have ONE db for all is not the more elegant solution. But is also true that this database (for users and settings) is only "operative".
I will build a simple plugin with this implementation, so you can start this simple and fast dbs (AI is also for fun, not only production).
(So, for example, I can store agents information in my plugin multicat😁)
Also please chek out this issue on the cache, in which I propose to have a
settings.pyfile to configure system wide settings instead of factory and endpoints.
settings.py I believe can be very very interesting!
For example, if you redifine StrayCat class (and some others),with minimal work you could transform it into a multi-agent system (exactly what I'm doing in my plugin).
If this isn't already being worked on, I can work on it. Thanks
v2 features SQL with ORM (Piccolo) that is easy to use, compatible with both sqlite and postgres, and works both sync and async.
Cat will be stateless (no files, no stuff in memory unless strictly necessary). Still to decide how to treat working memory, tending towards having it passed by and returned to the client at each request.
Old cache system is no longer necessary, if plugins want to use caches to store and retrieve data they can do it on their own. Same goes for vector memories; Qdrant memory will be just a plugin, shipped by default to keep v1 functionality. You will be able to create custom memory systems with other vector DBs, or graph dbs or whatever.
Internal SQL is for internal purposes but additional tables can be defined via plugin in case you want to use SQL
Closing this thx