langchain
langchain copied to clipboard
MMR Search in Chroma not working, typo suspected
System Info
Langchain v0.0.171 Mac OS
Who can help?
@jeffchuber
Information
- [ ] The official example notebooks/scripts
- [X] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [X] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
If I initialise a chroma database and then retriever
db = Chroma.from_documents(texts, embeddings_function(),
metadatas=[{"source": str(i)} for i in range(len(texts))], persist_directory=PERSIST_DIRECTORY)
querybase = db.as_retriever(search_type="mmr", search_kwargs={"k":3, "lambda_mult":1})
retrieved files are then identical whether I pass 0.1 or 0.9 as lambda_mult parameter.
Expected behavior
I expect different file.
Digging into the code there is a typo I think in langchain.vectorstores.chroma last line should be lambda_mult and not lambda_mul :
As this is my first time, not sure how to properly suggest or test :)
def max_marginal_relevance_search(
self,
query: str,
k: int = 4,
fetch_k: int = 20,
lambda_mult: float = 0.5,
filter: Optional[Dict[str, str]] = None,
**kwargs: Any,
) -> List[Document]:
"""Return docs selected using the maximal marginal relevance.
Maximal marginal relevance optimizes for similarity to query AND diversity
among selected documents.
Args:
query: Text to look up documents similar to.
k: Number of Documents to return. Defaults to 4.
fetch_k: Number of Documents to fetch to pass to MMR algorithm.
lambda_mult: Number between 0 and 1 that determines the degree
of diversity among the results with 0 corresponding
to maximum diversity and 1 to minimum diversity.
Defaults to 0.5.
filter (Optional[Dict[str, str]]): Filter by metadata. Defaults to None.
Returns:
List of Documents selected by maximal marginal relevance.
"""
if self._embedding_function is None:
raise ValueError(
"For MMR search, you must specify an embedding function on" "creation."
)
embedding = self._embedding_function.embed_query(query)
docs = self.max_marginal_relevance_search_by_vector(
embedding, k, fetch_k, lambda_mul=lambda_mult, filter=filter
)
return docs
@hwchase17 🤔 any thoughts here? I didn't write this but happy to help.
Chroma.from_documents()
takes Document
object as a parameter not text and metadata separately.
you have to use Chroma.from_texts()
according to your use-case because your are providing texts and metadata separately.
In the end, I decided to use something else to get around ... but I still think the last line of this python code is suspect with a typo :)