langchainjs icon indicating copy to clipboard operation
langchainjs copied to clipboard

json loader control string separation problem

Open aidec opened this issue 1 year ago • 2 comments

I am testing to load my article data JSON using a JSON loader and calling it with RetrievalQAChain. However, sometimes an error occurs and upon inspection, it is because the string passed to OpenAI is too long. Is it possible to control the automatic splitting of the string in the JSON loader based on character count or to control the number of strings passed to OpenAI?

aidec avatar Apr 02 '23 12:04 aidec

Hey @aidec , I could work on that. Do you have an example of a too large JSON that we use for testing?

Can you provide the code you are using?

MaximeThoonsen avatar Apr 07 '23 16:04 MaximeThoonsen

Hello,thank your reply This is my test data.

https://drive.google.com/file/d/1XUZyfUUiHnokNTGn4CDvUMMGhR2-xRLO/view?usp=sharing

I used this test data and split it using json loader successfully. However, when calling it, openai returned an error. Based on the error message, I suspect that the string passed was too long (my data is in Chinese)

Later, I tried to split it on my own and was able to use it normally. However, I'm not sure if this method of splitting is correct.

https://drive.google.com/file/d/1fWX-NoouQSQ4wIRGK8DmvvtrkXx1oM9Z/view?usp=sharing

AA98 AA99

aidec avatar Apr 07 '23 17:04 aidec

Hi, @aidec! I'm here to help the LangChain team manage their backlog and I wanted to let you know that we are marking this issue as stale.

Based on my understanding, you were experiencing an issue with the JSON loader and RetrievalQAChain, where the string passed to OpenAI is sometimes too long, causing an error. You requested a way to control the automatic splitting of the string based on character count or the number of strings passed to OpenAI. MaximeThoonsen has offered to work on the issue and has requested an example of a too large JSON for testing, which you have provided.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution and we appreciate your understanding. Let us know if you have any further questions or concerns.

dosubot[bot] avatar Aug 17 '23 17:08 dosubot[bot]