langchain icon indicating copy to clipboard operation
langchain copied to clipboard

$contain is not a valid operator in SelfQueryRetrieval

Open levalencia opened this issue 11 months ago • 2 comments

System Info

0.0.228

Who can help?

@hwchase17

Information

  • [ ] The official example notebooks/scripts
  • [ ] My own modified scripts

Related Components

  • [ ] LLMs/Chat Models
  • [ ] Embedding Models
  • [ ] Prompts / Prompt Templates / Prompt Selectors
  • [ ] Output Parsers
  • [ ] Document Loaders
  • [X] Vector Stores / Retrievers
  • [ ] Memory
  • [ ] Agents / Agent Executors
  • [ ] Tools / Toolkits
  • [ ] Chains
  • [ ] Callbacks/Tracing
  • [ ] Async

Reproduction

Code to reproduce:


 embeddings = OpenAIEmbeddings(deployment=OPENAI_EMBEDDING_DEPLOYMENT_NAME, 
                                model=OPENAI_EMBEDDING_MODEL_NAME, 
                                chunk_size=1)
    # create new index
    #pinecone.create_index("langchain-self-retriever-demo", dimension=1536)
    vectorstore = Pinecone.from_existing_index(index_name="cubigo", embedding=embeddings, namespace="vwProfilesMetadata")
   
    metadata_field_info = [
        AttributeInfo(
            name="FirstName",
            description="The first name of the resident",
            type="string",
        ),
        AttributeInfo(
            name="LastName",
            description="The last name of the resident",
            type="string",
        ),
        AttributeInfo(
            name="Gender",
            description="The gender of the resident",
            type="string",
        ),
        AttributeInfo(
            name="Birthdate", 
            description="The birthdate of the resident or the date the resident was born", 
            type="Date"
        ),
        AttributeInfo(
            name="Birthplace", 
            description="The birthplace of the resident or the place the resident was born", 
            type="string"
        ),
        AttributeInfo(
            name="Hometown", 
            description="The town or city where the resident grew up", 
            type="string"
        )
    ]
    document_content_description = "The content of the document describes " \
        "a resident of the facility, each document is a resident and it " \
        "has all the information about the resident like FirstName," \
        "LastName, RoleName, Gender, PhoneNumber, CellPhoneNumber, Address, " \
        "Birthdate, Birthplace, Hometown, Education, CollegeName, PastOccupations, " \
        "Veteran, NameOfSpouse, ReligiousPreferences, SpokenLanguages, " \
        "ActiveLiveDescription, RetiredLiveDescription, Accomplishments, AnniversaryDate, " \
        "YourTypicalDay, TalentsAndHobbies, InterestCategories, OtherInterestCategories," \
        "FavoriteActor, FavoriteActress, FavoriteAnimal, FavoriteArtist, FavoriteAuthor, " \
        "FavoriteBandMusicalArtist, FavoriteBook, FavoriteClimate, FavoriteColor, FavoriteCuisine, " \
        "FavoriteDance, FavoriteDessert, FavoriteDrink, FavoriteFood, FavoriteFruit, FavoriteFutureTravelDestination, " \
        "FavoriteGame, FavoriteMovie, FavoritePastTravelDestination, FavoriteSeasonOfTheYear, FavoriteSong, FavoriteSport, " \
        "FavoriteSportsTeam, FavoriteTvShow, FavoriteVegetable"
    user_input = get_text()
          
    llm = AzureChatOpenAI(
        openai_api_base=OPENAI_DEPLOYMENT_ENDPOINT,
        openai_api_version=OPENAI_API_VERSION ,
        deployment_name=OPENAI_DEPLOYMENT_NAME,
        openai_api_key=OPENAI_API_KEY,
        openai_api_type = OPENAI_API_TYPE ,
        model_name=OPENAI_MODEL_NAME,
        temperature=0)
    
    retriever = SelfQueryRetriever.from_llm(
        llm, vectorstore, document_content_description, metadata_field_info, verbose=True, enable_limit=True
    )

    #response = retriever.get_relevant_documents(user_input)
    
    chain = RetrievalQAWithSourcesChain.from_chain_type(llm=llm, 
                                    chain_type="stuff", 
                                    retriever=retriever,
                                    return_source_documents=True)
    
    if user_input:
        response = chain({"question": user_input})
Exception:
`
ApiException: (400) Reason: Bad Request HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'date': 'Tue, 11 Jul 2023 11:04:33 GMT', 'x-envoy-upstream-service-time': '0', 'content-length': '68', 'server': 'envoy'}) HTTP response body: {"code":3,"message":"$contain is not a valid operator","details":[]
```}`

Question I am asking:
Who is interested in baking?  

if I ask:
Who likes baking? Then no errors

### Expected behavior

Should get a clear response or no answer.

levalencia avatar Jul 11 '23 11:07 levalencia

Answer generated by a 🤖

Answer

I found related issues that might be helpful. I did my best to summarize the solutions, but I recommend looking at the related issues yourself.

Open Issues

SelfQuering Retrieval no support $contain operator

The suggested solution is: The issue has been addressed in the fix #3892. The user is advised to check if the problem persists after the next release.


This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

dosubot[bot] avatar Jul 11 '23 12:07 dosubot[bot]

still needs a fix

levalencia avatar Jul 11 '23 13:07 levalencia

Hi, @levalencia. I'm Dosu, and I'm helping the LangChain team manage our backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue you reported is related to the $contain operator not being recognized in the SelfQueryRetrieval class. You mentioned that you encountered an error when using this operator in your code. Another user, "dosu-beta", suggested checking issue #7157, where it was mentioned that the issue has been addressed in fix #3892. However, you responded that the issue still needs a fix.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on this issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

Thank you for your understanding and contribution to the LangChain project. Let us know if you have any further questions or concerns.

dosubot[bot] avatar Oct 10 '23 16:10 dosubot[bot]

This issue is still relevant.

analyticanna avatar Oct 23 '23 17:10 analyticanna

Still same issue, when a fix ??

shadow66603 avatar Feb 23 '24 09:02 shadow66603

Unfortunately, Pinecone doesn't allow the contain or like comparators, see here. To fix this, you need to manually construct your query constructor prompt. Instead of using SelfQueryRetriever.from_llm to construct your retriever, use the following code (you can find similar info here):

from langchain.chains.query_constructor.base import (
    StructuredQueryOutputParser,
    get_query_constructor_prompt,
)

document_content_description = "Brief summary of a movie"

# Define allowed comparators list
allowed_comparators = [
    "$eq",  # Equal to (number, string, boolean)
    "$ne",  # Not equal to (number, string, boolean)
    "$gt",  # Greater than (number)
    "$gte",  # Greater than or equal to (number)
    "$lt",  # Less than (number)
    "$lte",  # Less than or equal to (number)
    "$in",  # In array (string or number)
    "$nin",  # Not in array (string or number)
]

constructor_prompt = get_query_constructor_prompt(
    document_content_description,
    metadata_field_info,
    allowed_comparators=allowed_comparators,
)


query_model = ChatOpenAI(
    # model='gpt-3.5-turbo-0125',
    model='gpt-4-0125-preview',
    temperature=0,
    streaming=True,
)

output_parser = StructuredQueryOutputParser.from_components()
query_constructor = constructor_prompt | query_model | output_parser

from langchain.retrievers.self_query.pinecone import PineconeTranslator

retriever = SelfQueryRetriever(
    query_constructor=query_constructor,
    vectorstore=vectorstore,
    structured_query_translator=PineconeTranslator(),
)

Seems to me that the documentation should be updated to reflect the importance of the allowed comparators for different vector stores.

EdIzaguirre avatar Mar 22 '24 18:03 EdIzaguirre