langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Documentation should point out how to retrieve a vectorstore already uploaded in a database

Open eRuaro opened this issue 2 years ago • 6 comments

The documentation in the Langchain site and the code repo should point out that you can actually retrieve the vector store from your choice of databases. I thought you couldn't do this and implemented a wrapper to retrieve the values from the database and mapped it to the appropriate langchain class, only to find out a day later through experimenting that you can actually just query it using langchain and it will be mapped to the appropriate class.

The examples in the site documentation always have a similar format to this:

db = PGVector.from_documents(
    documents=data,
    embedding=embeddings,
    collection_name=collection_name,
    connection_string=connection_string,
    distance_strategy=DistanceStrategy.COSINE,
    openai_api_key=api_key,
    pre_delete_collection=False 
)

Which is good if you're indexing a document for the first time and adding them in the database. But what if I plan to ask questions to the same document? It'd be time-consuming, and also heavy to keep on indexing the document and adding them all the time to the database.

If I already have a vectorestore on a PGVector database, I can query it with the code below:

store = PGVector(
    connection_string=connection_string, 
    embedding_function=embedding, 
    collection_name=collection_name,
    distance_strategy=DistanceStrategy.COSINE
)

retriever = store.as_retriever()

And use the store, and retriever as such with the appropriate chain one may use.

eRuaro avatar Apr 20 '23 00:04 eRuaro

Thank you for this, I was running into the same problem.

I wish you could also create a collection by giving the collection_name. I am using Milvus:

pymilvus.exceptions.SchemaNotReadyException: <SchemaNotReadyException: (code=1, message=Collection 'MyCollection' not exist, or you can pass in schema to create one.)>

phughesion avatar Apr 20 '23 17:04 phughesion

Can't you already do that with from_documents? It wouldn't make sense to create one from the class instance PGVector for example since that fetches data from the vector database.

eRuaro avatar Apr 21 '23 06:04 eRuaro

I also was having this issue - I kept thinking "why would I need to re-load my entire source data when I already have it loaded into VectorStore?"

Now, I can have these things work in parallel, an ingestion into vectorstore and qa operating using your example above.

jaredbarranco avatar May 05 '23 05:05 jaredbarranco

... And use the store, and retriever as such with the appropriate chain one may use.

this is great @eRuaro! want to open add your example to the demo notebook? docs/modules/indexes/vectorstores/examples/pgvector.ipynb

dev2049 avatar May 11 '23 21:05 dev2049

I'll create a PR for it later!

eRuaro avatar May 11 '23 21:05 eRuaro

Here's the PR @dev2049 https://github.com/hwchase17/langchain/pull/4578

eRuaro avatar May 12 '23 13:05 eRuaro

Hi, @eRuaro! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, you raised an issue about the need to update the documentation to include information on how to retrieve a vector store that has already been uploaded in a database. You provided an example code snippet for querying a vector store from a PGVector database. Other users, such as phughesion and jaredbarranco, expressed their gratitude for the solution and suggested adding the example to the demo notebook. You agreed to create a pull request for the documentation update.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your contribution, and we appreciate your understanding in helping us manage the repository effectively. If you have any further questions or concerns, please let us know.

dosubot[bot] avatar Sep 17 '23 17:09 dosubot[bot]

How to add indexes and retrieve them based on those indexes in PGVector?

Swetha-Hariharan1810 avatar Dec 04 '23 06:12 Swetha-Hariharan1810

I have a default llamaindex vectorstore index, how can i use that to create a PGVector collectio, as I don't want to generate the embedding from my documents , due to llm calls.

Gangadharbhuvan avatar Feb 13 '24 06:02 Gangadharbhuvan