langchain icon indicating copy to clipboard operation
langchain copied to clipboard

supabase vectorstore - first cut

Open danielchalef opened this issue 1 year ago • 5 comments

First cut of a supabase vectorstore loosely patterned on the langchainjs equivalent. Doesn't support async operations which is a limitation of the supabase python client.

danielchalef avatar Apr 18 '23 17:04 danielchalef

I'm overriding from_texts in an incompatible manner having added additional arguments. How do you suggest I modify to pass the linter?

danielchalef avatar Apr 18 '23 17:04 danielchalef

@dev2049 Modified per your suggestions. mypy is still unhappy with the method signature overrides:

langchain/vectorstores/supabase.py:95: error: Signature of "from_documents" incompatible with supertype "VectorStore"  [override]
langchain/vectorstores/supabase.py:95: note:      Superclass:
langchain/vectorstores/supabase.py:95: note:          @classmethod
langchain/vectorstores/supabase.py:95: note:          def from_documents(cls, documents: List[Document], embedding: Embeddings, **kwargs: Any) -> SupabaseVectorStore

Note that the signature for from_documents hasn't changed other than 3 additional explicit args.

danielchalef avatar Apr 18 '23 20:04 danielchalef

Note that the signature for from_documents hasn't changed other than 3 additional explicit args.

guessing you have to include **kwargs at the end. but also note that base class VectorStore has default from_documents implementation that probably works for you use case

    @classmethod
    def from_documents(
        cls: Type[VST],
        documents: List[Document],
        embedding: Embeddings,
        **kwargs: Any,
    ) -> VST:
        """Return VectorStore initialized from documents and embeddings."""
        texts = [d.page_content for d in documents]
        metadatas = [d.metadata for d in documents]
        return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

dev2049 avatar Apr 18 '23 20:04 dev2049

@danielchalef excited to see this! has been in typescript for a while, excited to getting python to feature parity ;) thanks for adding

hwchase17 avatar Apr 18 '23 21:04 hwchase17

note that base class VectorStore has default from_documents implementation that probably works for you use case

Good catch. Thanks!

mypy doesn't like my previous use of a keyword-only argument marker. Have made the additional arguments Optional and checking for presence at runtime. Not ideal but can't think of an alternative.

danielchalef avatar Apr 18 '23 22:04 danielchalef

@hwchase17 Done!

danielchalef avatar Apr 20 '23 03:04 danielchalef

I've followed this: https://python.langchain.com/en/latest/modules/indexes/vectorstores/examples/supabase.html

but when i do similarity_search it responds with postgrest.exceptions.APIError: {'code': '42P01', 'details': None, 'hint': None, 'message': 'relation "docstore" does not exist'} from Supabase. Is there something in the SQL script via the tutorial that isn't correct?

liamcharmer avatar Apr 21 '23 00:04 liamcharmer

Create a table called "docstore" or modify the Postgres function to use the table name reflecting your schema.

I'll cut a PR later to correct the example function.

danielchalef avatar Apr 21 '23 00:04 danielchalef

Create a table called "docstore" or modify the Postgres function to use the table name reflecting your schema.

I'll cut a PR later to correct the example function.

Ohhh my bad! Thank you so much apologies for me being silly

liamcharmer avatar Apr 21 '23 00:04 liamcharmer

Any way to query from_existing_index, like in Pinecone and in the JS implementation?

rapcal avatar Apr 30 '23 21:04 rapcal

Sounds like a great feature, though assumptions would need to be made around embedding model. I don't have bandwidth to work on this, but am sure the langchain team would appreciate the contribution.

danielchalef avatar May 01 '23 01:05 danielchalef

Any way to query from_existing_index, like in Pinecone and in the JS implementation?

did you found any way?

kasem-sm avatar Jun 15 '23 02:06 kasem-sm

Any way to query from_existing_index, like in Pinecone and in the JS implementation?

did you found any way?

Nope. Went back to Pinecone.

rapcal avatar Jun 15 '23 02:06 rapcal

@kasem-sm @rapcal I believe it's just this?

supabase: Client = create_client(supabase_url, supabase_key)
embeddings = OpenAIEmbeddings()
vector_store = SupabaseVectorStoreWithMetadataFiltering(table_name='documents', embedding=embeddings, client=supabase, query_name='match_documents')

That said the supabase integration is lacking functionality such as filtering, but I think this should work for just loading it as an index. Maybe I'll push a pr for the docs.

ShantanuNair avatar Jun 27 '23 16:06 ShantanuNair

@rapcal @kasem-sm If you have a few set fields you want to filter with, and not too much dynamic filtering, I got it working quite nicely by implementing my own Vector Store class. I explained a bit more here https://github.com/hwchase17/langchain/pull/5379#issuecomment-1610775206, and if needed I can share some minimal reproducible code if someone wants it.

ShantanuNair avatar Jun 28 '23 05:06 ShantanuNair