langchain icon indicating copy to clipboard operation
langchain copied to clipboard

[WIP][RFC] Add a method to construct vector store from existing index for Pinecone

Open Who828 opened this issue 1 year ago • 1 comments

This is a sample PR to construct VectorStore from the existing index for Pinecone; I am happy to implement similar methods for the rest of the implementations.

While I used index_name here, a consistent implementation would be to pass the index instead, especially for stores like FAISS.

Let me know your thoughts.

Usecase:

  • For caching/long-term purposes
  • Using in a serverless environment, etc

Who828 avatar Jan 08 '23 19:01 Who828

One is to keep it consistent with conventions of VectorStore methods, We have a method named from_texts, which constructs the store from documents.

I thought we should have a method which constructs the store from the existing index.

Another reason is, for stores like FAISS there is more logic to it than just calling the constructor,

    @classmethod
    def from_index(cls, index, texts, embedding) -> FAISS:
        documents = []
        for i, text in enumerate(texts):
            metadata = metadatas[i] if metadatas else {}
            documents.append(Document(page_content=text, metadata=metadata))
        index_to_id = {i: str(uuid.uuid4()) for i in range(len(documents))}
        docstore = InMemoryDocstore(
            {index_to_id[i]: doc for i, doc in enumerate(documents)}
        )
        return cls(embedding.embed_query, index, docstore, index_to_id)

Since it requires InMemoryDocstore to keep the relationship between the document and the index.

That's why I went with adding a new class method in this case. Thoughts?

Who828 avatar Jan 09 '23 19:01 Who828

Just my 2 cents: I think this makes it more clear how to pull from an existing index as it wasn't immediately obvious to me from the constructor.

bernie-g avatar Jan 11 '23 18:01 bernie-g