WIP Make (mostly) pgai 0.4.0 compatible
I've loaded a dump of a timescaledb after running the quickstart and wrote some tests with the sync client. I then tried to make those tests pass.
Changes I made:
- Rename
contents->chunk - Rename
id->embedding_uuid(but configurable) - Add the option to configure an
embedding_table_name(otherwise it is assumed to betable_name + "_store") - Made metadata column name configurable and optional (default off)
- Added/Fixed the return type of search and the filter params type
Some things are a bit confusing:
-
Are inserts/deletes/updates supposed to work? The whole idea of pgai 0.4.0 is that you don't have to manage these. The fact that the pgai table schema has a foreign key and the previous examples don't... makes this pretty hard to enable through the client right now.
-
Same for index operations, these are created natively by the SQL functions right?
-
The Vectorize test breaks because the langchain_community integration isn't updated yet to provide the new
embedding_table_name. This is a bit odd as a dev workflow, I'm wondering if it makes sense to define the integrations in this package too and then just import them in the integrations? That would allow for easier testing I think.
About the embedding_table_name: Since pgai creates a view on top of the vectordb which e.g. holds metadata. I've for now made it the default that you pass in the view name and the library "assumes" the embeddings_store table. For search operations it simply uses the view. For anything else the underlying embeddings_store table (but due to problems from 1.) only deletes really work rn).
How to Review? I think the best way to understand what works and what doesn't is too have a look at the new compatibility_test.py file. The first two tests are just validating that the import worked then a client is created and the individual methods are tested.