langchain
langchain copied to clipboard
[WIP][RFC] Add a method to construct vector store from existing index for Pinecone
This is a sample PR to construct VectorStore
from the existing index for Pinecone; I am happy to implement similar methods for the rest of the implementations.
While I used index_name
here, a consistent implementation would be to pass the index
instead, especially for stores like FAISS
.
Let me know your thoughts.
Usecase:
- For caching/long-term purposes
- Using in a serverless environment, etc
One is to keep it consistent with conventions of VectorStore
methods,
We have a method named from_texts
, which constructs the store from documents.
I thought we should have a method which constructs the store from the existing index.
Another reason is, for stores like FAISS
there is more logic to it than just calling the constructor,
@classmethod
def from_index(cls, index, texts, embedding) -> FAISS:
documents = []
for i, text in enumerate(texts):
metadata = metadatas[i] if metadatas else {}
documents.append(Document(page_content=text, metadata=metadata))
index_to_id = {i: str(uuid.uuid4()) for i in range(len(documents))}
docstore = InMemoryDocstore(
{index_to_id[i]: doc for i, doc in enumerate(documents)}
)
return cls(embedding.embed_query, index, docstore, index_to_id)
Since it requires InMemoryDocstore to keep the relationship between the document and the index.
That's why I went with adding a new class method in this case. Thoughts?
Just my 2 cents: I think this makes it more clear how to pull from an existing index as it wasn't immediately obvious to me from the constructor.