langchain icon indicating copy to clipboard operation
langchain copied to clipboard

Challange when using Langchain for customer review analysis.

Open Jeru2023 opened this issue 1 year ago • 5 comments

Issue you'd like to raise.

Context: I'm trying to chat with my dataset as customer reviews from a restaurant. I would like to have LLM make a summary for every single store individually, I found it difficult to generate the expected output using any types of chains, so alternatively I preprocess my dataset before ingest.

I save the reviews as one text file per store(there are around 20 stores, which means I created 20 text files per store) Then I embedded 20 files into one vectordb, code as below chain = RetrievalQAWithSourcesChain.from_chain_type( llm=model, chain_type="stuff", retriever=db.as_retriever(), chain_type_kwargs=chain_type_kwargs, reduce_k_below_max_tokens=True )

my prompt is sth like "make a summary of customer reviews per store", however only 4 stores with summary generated, I guess only 4 documents returned as context?Is there any solution with one single prompt I can indicate the LLM generate summaries for all 20 stores? thanks.

Suggestion:

No response

Jeru2023 avatar May 16 '23 08:05 Jeru2023

I think Steercode actually provides the correct answer in this case. Here is the answer it has generated:


It seems that the issue you're facing is that the LLM is only generating summaries for a limited number of stores. To ensure that the LLM generates summaries for all 20 stores, you can adjust the number of documents returned by the retriever by modifying the k parameter in the RetrievalQAWithSourcesChain.from_chain_type(). By default, the value of k might be too small, causing only a few stores to be considered. Here's how you can modify your code to set the k value:

chain_type_kwargs = {
    "k": 20,  # Set the number of documents to be retrieved
    # Add any other necessary kwargs
}

chain = RetrievalQAWithSourcesChain.from_chain_type(
    llm=model,
    chain_type="stuff",
    retriever=db.as_retriever(),
    chain_type_kwargs=chain_type_kwargs,
    reduce_k_below_max_tokens=True
)

By setting k to 20, you're telling the retriever to return the top 20 relevant documents, which should include all the stores in your case. This should help you generate summaries for all the 20 stores with a single prompt.

votrumar avatar May 16 '23 08:05 votrumar

Thanks again @votrumar, sounds like a reasonable solution.

but unfortunately i got an error: ValidationError: 1 validation error for StuffDocumentsChain k extra fields not permitted (type=value_error.extra)

And I throw this erro to steercode, it looks like "k" param is not supported in StuffDocumentsChain? user ValidationError: 1 validation error for StuffDocumentsChain k extra fields not permitted (type=value_error.extra) It seems like you're getting a validation error for the StuffDocumentsChain class, indicating that there are extra fields not permitted. This error is likely caused by passing a field that is not recognized by the model. To resolve this issue, you should double-check the parameters you're passing to the StuffDocumentsChain class and make sure they are valid. Here's a brief overview of the properties and methods in the StuffDocumentsChain class: Properties: llm_chain: The language model chain used for prediction. document_prompt: The prompt template for combining documents. document_variable_name: The variable name for the document in the prompt. document_separator: The separator used to separate documents in the combined prompt. Methods: get_default_document_variable_name: Get the default document variable name if not provided, and validate it against the input variables of the llm_chain. _get_inputs: Generates a dictionary of input variables for LLM Chain prompt from a list of documents and additional keyword arguments. prompt_length: Get the prompt length by formatting the prompt for a given list of documents and additional keyword arguments. combine_docs: Combine multiple documents into a single prompt and pass it to the LLM Chain for prediction. _chain_type: Returns the type of the documents chain as a string 'stuff_documents_chain'. Please ensure that the parameters you're passing to the StuffDocumentsChain class match the expected properties. If you can provide more information about how you're using the StuffDocumentsChain class, I can help you further.

Jeru2023 avatar May 16 '23 11:05 Jeru2023

I just have a quick check on langchain api reference, probably i should say k param is not supported in RetrievalQAWithSourcesChain

Jeru2023 avatar May 16 '23 11:05 Jeru2023

You are right, you have to specify the k when you are constructing the retriever (so actually not as chain_type_kwargs).

votrumar avatar May 16 '23 12:05 votrumar

@votrumar Thanks a lot, are you the owner of steercode? it's cool app, I will introduce this website in my coming conference presentation

Jeru2023 avatar May 16 '23 14:05 Jeru2023

Yes, I am one of the creators :) I am glad you like it!

votrumar avatar May 16 '23 18:05 votrumar

Hi, @Jeru2023! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you were having trouble generating summaries for all 20 stores using Langchain for customer review analysis. You mentioned that you tried preprocessing your dataset and embedding the files into one vectordb, but only 4 out of 20 stores have generated summaries. Votrumar suggested adjusting the number of documents returned by the retriever by modifying the k parameter in the RetrievalQAWithSourcesChain.from_chain_type() function. However, you encountered an error and Votrumar suggested double-checking the parameters passed to the StuffDocumentsChain class. You confirmed that the k parameter is not supported in RetrievalQAWithSourcesChain.

Before we close this issue, could you please let us know if this issue is still relevant to the latest version of the LangChain repository? If it is, please comment on this issue to let us know. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation!

dosubot[bot] avatar Sep 12 '23 16:09 dosubot[bot]