langchain
langchain copied to clipboard
DocArray as a Retriever
DocArray as a Retriever
DocArray is an open-source tool for managing your multi-modal data. It offers flexibility to store and search through your data using various document index backends. This PR introduces DocArrayRetriever
- which works with any available backend and serves as a retriever for Langchain apps.
Also, I added 2 notebooks: DocArray Backends - intro to all 5 currently supported backends, how to initialize, index, and use them as a retriever DocArray Usage - showcasing what additional search parameters you can pass to create versatile retrievers
Example:
from docarray.index import InMemoryExactNNIndex
from docarray import BaseDoc, DocList
from docarray.typing import NdArray
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.retrievers import DocArrayRetriever
# define document schema
class MyDoc(BaseDoc):
description: str
description_embedding: NdArray[1536]
embeddings = OpenAIEmbeddings()
# create documents
descriptions = ["description 1", "description 2"]
desc_embeddings = embeddings.embed_documents(texts=descriptions)
docs = DocList[MyDoc](
[
MyDoc(description=desc, description_embedding=embedding)
for desc, embedding in zip(descriptions, desc_embeddings)
]
)
# initialize document index with data
db = InMemoryExactNNIndex[MyDoc](docs)
# create a retriever
retriever = DocArrayRetriever(
index=db,
embeddings=embeddings,
search_field="description_embedding",
content_field="description",
)
# find the relevant document
doc = retriever.get_relevant_documents("action movies")
print(doc)
Who can review?
@dev2049
It would be nice to also add jina's annlite
for the vector store option as well.
hey @jpzhangvincent, annlite is not yet compatible with the new docarray version, but we might do it in the future, thanks for the suggestion!
The latest updates on your projects. Learn more about Vercel for Git ↗︎
Name | Status | Preview | Comments | Updated (UTC) |
---|---|---|---|---|
langchain | ✅ Ready (Inspect) | Visit Preview | 💬 Add feedback | Jun 16, 2023 7:45pm |
@hwchase17 @vowelparrot @dev2049
I'm not sure why Vercel is failing, I think it fails for all other recent PRs.
@jupyterjazz is attempting to deploy a commit to the LangChain Team on Vercel.
A member of the Team first needs to authorize it.
hey @hwchase17 @vowelparrot @dev2049
I think Vercel needs some approval from your side and CI should be green afterwards. The comment about separate notebooks is addressed!