DocsGPT icon indicating copy to clipboard operation
DocsGPT copied to clipboard

🐛 Bug Report: Large zip breaking stream endpoint

Open pabik opened this issue 1 year ago • 4 comments

📜 Description

Stream endpoint doesn't provide answer when embedded file in zip archive is long.

👟 Reproduction steps

  1. Upload a zip file
  2. Try chatting docs_tester.zip

👍 Expected behavior

DocsGPT should provide an answer.

👎 Actual Behavior with Screenshots

No answer, stream endpoint breaks. image

💻 Operating system

MacOS

What browsers are you seeing the problem on?

Chrome

🤖 What development environment are you experiencing this bug on?

Docker

🔒 Did you set the correct environment variables in the right path? List the environment variable names (not values please!)

No response

📃 Provide any additional context for the Bug.

No response

📖 Relevant log output

No response

👀 Have you spent some time to check if this bug has been raised before?

  • [X] I checked and didn't find similar issue

🔗 Are you willing to submit PR?

None

🧑‍⚖️ Code of Conduct

  • [X] I agree to follow this project's Code of Conduct

pabik avatar Feb 21 '24 14:02 pabik

Looks like it happens because the file is not being chunked properly or at all when answering, thus resulting current context token overload

dartpain avatar Jun 07 '24 13:06 dartpain

Hi, I would like to work on this issue.

nayelimdejesus avatar Jun 10 '24 17:06 nayelimdejesus

When you upload a big zip file what answer should it provide?

nayelimdejesus avatar Jun 15 '24 18:06 nayelimdejesus

Just shouldn't break. Basically make sure that it doesn't error out. Try running it with the file attached.

dartpain avatar Jun 17 '24 13:06 dartpain

I am interested to work on this issue.

jayantp2003 avatar Oct 10 '24 21:10 jayantp2003

I was playing around with the zip file and couple of different files, I found that its not an issue related to chunking of code, there is some issue with RstParser class, I did update the file extensions to text file, for that case, it was working fine.

image image

Currently checking the Rstparser class to figure out the changes required.

jayantp2003 avatar Oct 10 '24 23:10 jayantp2003

The issue is with the implementation of rst parser, in each file, it looks for a header and a text below it, but for the zip file we are testing on, it is just a single file with no header available, hence it is not being chunked. This header and text breakdown thing also seems to be an issue for markdown parser. The file should be chunked based on tokens or bytes and this tuple implementation also need to be updated.

jayantp2003 avatar Oct 11 '24 10:10 jayantp2003

Yeah seems like thats the issue, lets add another token size handler to it maybe?

dartpain avatar Oct 11 '24 10:10 dartpain

Hey, I have updated the code and created a PR, can you review it and approve, I am new to open source contributions and do not know how it works, after making a PR. Open to feedbacks.

jayantp2003 avatar Oct 11 '24 11:10 jayantp2003

@dartpain Can you review my changes and provide feedback, and approve if implementation seems correct.

jayantp2003 avatar Oct 11 '24 17:10 jayantp2003