langchain
langchain copied to clipboard
AzureSearch.avector_search_with_score() triggers "TypeError: 'AsyncSearchItemPaged' object is not iterable" when calling _results_to_documents()
Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
embeddings = AzureOpenAIEmbeddings(
azure_endpoint=azure_endpoint,
openai_api_version=openai_api_version,
openai_api_key=openai_api_key,
openai_api_type=openai_api_type,
deployment=deployment,
chunk_size=1)
vectorstore = AzureSearch(
azure_search_endpoint=azure_search_endpoint,
azure_search_key=azure_search_key,
index_name=index_name,
embedding_function=embeddings.embed_query,
)
system_message_prompt = SystemMessagePromptTemplate.from_template(
system_prompt)
human_message_prompt = HumanMessagePromptTemplate.from_template(
human_template)
chat_prompt = ChatPromptTemplate.from_messages(
[system_message_prompt, human_message_prompt])
doc_chain = load_qa_chain(
conversation_llm, chain_type="stuff", prompt=chat_prompt, callback_manager=default_manager
)
conversation_chain = ConversationalRetrievalChain(
retriever=vectorstore.as_retriever(search_type="similarity_score_threshold", k=rag_top_k,
search_kwargs={"score_threshold": rag_score_threshold}),
combine_docs_chain=doc_chain,
question_generator=question_generator,
return_source_documents=True,
callback_manager=default_manager,
rephrase_question=False,
memory=memory,
max_tokens_limit=max_retrieval_tokens,
)
result = await conversation_chain.ainvoke({"question": question, "chat_history": chat_history}
Error Message and Stack Trace (if applicable)
TypeError("'AsyncSearchItemPaged' object is not iterable")Traceback (most recent call last):
File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain/chains/base.py", line 208, in ainvoke await self._acall(inputs, run_manager=run_manager)
File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py", line 212, in _acall docs = await self._aget_docs(new_question, inputs, run_manager=_run_manager)
File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain/chains/conversational_retrieval/base.py", line 410, in _aget_docs docs = await self.retriever.ainvoke(
File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_core/retrievers.py", line 280, in ainvoke raise e
File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_core/retrievers.py", line 273, in ainvoke result = await self._aget_relevant_documents(
File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_community/vectorstores/azuresearch.py", line 1590, in _aget_relevant_documents await self.vectorstore.asimilarity_search_with_relevance_scores(
File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_community/vectorstores/azuresearch.py", line 663, in asimilarity_search_with_relevance_scores result = await self.avector_search_with_score(query, k=k, **kwargs)
File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_community/vectorstores/azuresearch.py", line 750, in avector_search_with_score return _results_to_documents(results)
File "/Users/crobert/Code/Higher Bar AI/almitra-pilot-be/venv/lib/python3.10/site-packages/langchain_community/vectorstores/azuresearch.py", line 1623, in _results_to_documents docs = [
TypeError: 'AsyncSearchItemPaged' object is not iterable
Description
This commit for issue #24064 caused a regression in async support. After that commit, avector_search_with_score()
calls _asimple_search()
, which uses async with self.async_client
, and then tries to call _results_to_documents()
with the results — but that triggers a "TypeError: 'AsyncSearchItemPaged' object is not iterable" because it uses AsyncSearchItemPaged
on a closed HTTP connection (because the connection closed at the end of the _asimple_search()
with
block.
The original async PR #22075 seemed to have the right idea: the async results need to be handled within the with
block. Looking at that code, it looks like it should probably work. However, if I roll back to 0.2.7, I run into the "KeyError('content_vector')" that triggered issue #24064. For the moment, I've gotten things running by overriding AzureSearch as follows:
class ExtendedAzureSearch(AzureSearch):
"""Extended AzureSearch class with patch to fix async support."""
async def _asimple_search_docs(
self,
embedding: List[float],
text_query: str,
k: int,
*,
filters: Optional[str] = None,
**kwargs: Any,
) -> List[Tuple[Document, float]]:
"""Perform vector or hybrid search in the Azure search index.
Args:
embedding: A vector embedding to search in the vector space.
text_query: A full-text search query expression;
Use "*" or omit this parameter to perform only vector search.
k: Number of documents to return.
filters: Filtering expression.
Returns:
Matching documents with scores
"""
from azure.search.documents.models import VectorizedQuery
async with self.async_client as async_client:
results = await async_client.search(
search_text=text_query,
vector_queries=[
VectorizedQuery(
vector=np.array(embedding, dtype=np.float32).tolist(),
k_nearest_neighbors=k,
fields=FIELDS_CONTENT_VECTOR,
)
],
filter=filters,
top=k,
**kwargs,
)
docs = [
(
Document(
page_content=result.pop(FIELDS_CONTENT),
metadata=json.loads(result[FIELDS_METADATA])
if FIELDS_METADATA in result
else {
key: value for key, value in result.items() if key != FIELDS_CONTENT_VECTOR
},
),
float(result["@search.score"]),
)
async for result in results
]
return docs
# AP-254 - This version of avector_search_with_score() calls _asimple_search_docs() instead of _asimple_search()
# followed by _results_to_documents(results) because _asimple_search() uses `async with self.async_client`, which
# closes the paging connection on return, which makes it so the results are not available for
# _results_to_documents() (triggering "TypeError: 'AsyncSearchItemPaged' object is not iterable").
async def avector_search_with_score(
self,
query: str,
k: int = 4,
filters: Optional[str] = None,
**kwargs: Any,
) -> List[Tuple[Document, float]]:
"""Return docs most similar to query.
Args:
query (str): Text to look up documents similar to.
k (int, optional): Number of Documents to return. Defaults to 4.
filters (str, optional): Filtering expression. Defaults to None.
Returns:
List[Tuple[Document, float]]: List of Documents most similar
to the query and score for each
"""
embedding = await self._aembed_query(query)
return await self._asimple_search_docs(
embedding, "", k, filters=filters, **kwargs
)
System Info
System Information
OS: Darwin OS Version: Darwin Kernel Version 23.5.0: Wed May 1 20:12:58 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6000 Python Version: 3.10.9 (v3.10.9:1dd9be6584, Dec 6 2022, 14:37:36) [Clang 13.0.0 (clang-1300.0.29.30)]
Package Information
langchain_core: 0.2.9 langchain: 0.2.11 langchain_community: 0.2.10 langsmith: 0.1.81 langchain_aws: 0.1.7 langchain_openai: 0.1.8 langchain_text_splitters: 0.2.2 langchainplus_sdk: 0.0.21 langgraph: 0.1.14