langchain
langchain copied to clipboard
$contain is not a valid operator in SelfQueryRetrieval
System Info
0.0.228
Who can help?
@hwchase17
Information
- [ ] The official example notebooks/scripts
- [ ] My own modified scripts
Related Components
- [ ] LLMs/Chat Models
- [ ] Embedding Models
- [ ] Prompts / Prompt Templates / Prompt Selectors
- [ ] Output Parsers
- [ ] Document Loaders
- [X] Vector Stores / Retrievers
- [ ] Memory
- [ ] Agents / Agent Executors
- [ ] Tools / Toolkits
- [ ] Chains
- [ ] Callbacks/Tracing
- [ ] Async
Reproduction
Code to reproduce:
embeddings = OpenAIEmbeddings(deployment=OPENAI_EMBEDDING_DEPLOYMENT_NAME,
model=OPENAI_EMBEDDING_MODEL_NAME,
chunk_size=1)
# create new index
#pinecone.create_index("langchain-self-retriever-demo", dimension=1536)
vectorstore = Pinecone.from_existing_index(index_name="cubigo", embedding=embeddings, namespace="vwProfilesMetadata")
metadata_field_info = [
AttributeInfo(
name="FirstName",
description="The first name of the resident",
type="string",
),
AttributeInfo(
name="LastName",
description="The last name of the resident",
type="string",
),
AttributeInfo(
name="Gender",
description="The gender of the resident",
type="string",
),
AttributeInfo(
name="Birthdate",
description="The birthdate of the resident or the date the resident was born",
type="Date"
),
AttributeInfo(
name="Birthplace",
description="The birthplace of the resident or the place the resident was born",
type="string"
),
AttributeInfo(
name="Hometown",
description="The town or city where the resident grew up",
type="string"
)
]
document_content_description = "The content of the document describes " \
"a resident of the facility, each document is a resident and it " \
"has all the information about the resident like FirstName," \
"LastName, RoleName, Gender, PhoneNumber, CellPhoneNumber, Address, " \
"Birthdate, Birthplace, Hometown, Education, CollegeName, PastOccupations, " \
"Veteran, NameOfSpouse, ReligiousPreferences, SpokenLanguages, " \
"ActiveLiveDescription, RetiredLiveDescription, Accomplishments, AnniversaryDate, " \
"YourTypicalDay, TalentsAndHobbies, InterestCategories, OtherInterestCategories," \
"FavoriteActor, FavoriteActress, FavoriteAnimal, FavoriteArtist, FavoriteAuthor, " \
"FavoriteBandMusicalArtist, FavoriteBook, FavoriteClimate, FavoriteColor, FavoriteCuisine, " \
"FavoriteDance, FavoriteDessert, FavoriteDrink, FavoriteFood, FavoriteFruit, FavoriteFutureTravelDestination, " \
"FavoriteGame, FavoriteMovie, FavoritePastTravelDestination, FavoriteSeasonOfTheYear, FavoriteSong, FavoriteSport, " \
"FavoriteSportsTeam, FavoriteTvShow, FavoriteVegetable"
user_input = get_text()
llm = AzureChatOpenAI(
openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
openai_api_version=OPENAI_API_VERSION ,
deployment_name=OPENAI_DEPLOYMENT_NAME,
openai_api_key=OPENAI_API_KEY,
openai_api_type = OPENAI_API_TYPE ,
model_name=OPENAI_MODEL_NAME,
temperature=0)
retriever = SelfQueryRetriever.from_llm(
llm, vectorstore, document_content_description, metadata_field_info, verbose=True, enable_limit=True
)
#response = retriever.get_relevant_documents(user_input)
chain = RetrievalQAWithSourcesChain.from_chain_type(llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True)
if user_input:
response = chain({"question": user_input})
Exception:
`
ApiException: (400) Reason: Bad Request HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'date': 'Tue, 11 Jul 2023 11:04:33 GMT', 'x-envoy-upstream-service-time': '0', 'content-length': '68', 'server': 'envoy'}) HTTP response body: {"code":3,"message":"$contain is not a valid operator","details":[]
```}`
Question I am asking:
Who is interested in baking?
if I ask:
Who likes baking? Then no errors
### Expected behavior
Should get a clear response or no answer.
Answer generated by a 🤖
Answer
I found related issues that might be helpful. I did my best to summarize the solutions, but I recommend looking at the related issues yourself.
Open Issues
SelfQuering Retrieval no support $contain operator
The suggested solution is: The issue has been addressed in the fix #3892. The user is advised to check if the problem persists after the next release.
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
still needs a fix
Hi, @levalencia. I'm Dosu, and I'm helping the LangChain team manage our backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, the issue you reported is related to the $contain
operator not being recognized in the SelfQueryRetrieval
class. You mentioned that you encountered an error when using this operator in your code. Another user, "dosu-beta", suggested checking issue #7157, where it was mentioned that the issue has been addressed in fix #3892. However, you responded that the issue still needs a fix.
Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project. Let us know if you have any further questions or concerns.
This issue is still relevant.
Still same issue, when a fix ??
Unfortunately, Pinecone doesn't allow the contain
or like
comparators, see here. To fix this, you need to manually construct your query constructor prompt. Instead of using SelfQueryRetriever.from_llm
to construct your retriever, use the following code (you can find similar info here):
from langchain.chains.query_constructor.base import (
StructuredQueryOutputParser,
get_query_constructor_prompt,
)
document_content_description = "Brief summary of a movie"
# Define allowed comparators list
allowed_comparators = [
"$eq", # Equal to (number, string, boolean)
"$ne", # Not equal to (number, string, boolean)
"$gt", # Greater than (number)
"$gte", # Greater than or equal to (number)
"$lt", # Less than (number)
"$lte", # Less than or equal to (number)
"$in", # In array (string or number)
"$nin", # Not in array (string or number)
]
constructor_prompt = get_query_constructor_prompt(
document_content_description,
metadata_field_info,
allowed_comparators=allowed_comparators,
)
query_model = ChatOpenAI(
# model='gpt-3.5-turbo-0125',
model='gpt-4-0125-preview',
temperature=0,
streaming=True,
)
output_parser = StructuredQueryOutputParser.from_components()
query_constructor = constructor_prompt | query_model | output_parser
from langchain.retrievers.self_query.pinecone import PineconeTranslator
retriever = SelfQueryRetriever(
query_constructor=query_constructor,
vectorstore=vectorstore,
structured_query_translator=PineconeTranslator(),
)
Seems to me that the documentation should be updated to reflect the importance of the allowed comparators for different vector stores.