DocsGPT 🐛 Bug Report: Large zip breaking stream endpoint

📜 Description

Stream endpoint doesn't provide answer when embedded file in zip archive is long.

👟 Reproduction steps

Upload a zip file
Try chatting docs_tester.zip

👍 Expected behavior

DocsGPT should provide an answer.

👎 Actual Behavior with Screenshots

No answer, stream endpoint breaks.

💻 Operating system

MacOS

What browsers are you seeing the problem on?

Chrome

🤖 What development environment are you experiencing this bug on?

Docker

🔒 Did you set the correct environment variables in the right path? List the environment variable names (not values please!)

No response

📃 Provide any additional context for the Bug.

No response

📖 Relevant log output

No response

👀 Have you spent some time to check if this bug has been raised before?

[X] I checked and didn't find similar issue

🔗 Are you willing to submit PR?

None

🧑‍⚖️ Code of Conduct

[X] I agree to follow this project's Code of Conduct

Feb 21 '24 14:02 pabik

Looks like it happens because the file is not being chunked properly or at all when answering, thus resulting current context token overload

Jun 07 '24 13:06 dartpain

Hi, I would like to work on this issue.

Jun 10 '24 17:06 nayelimdejesus

When you upload a big zip file what answer should it provide?

Jun 15 '24 18:06 nayelimdejesus

Just shouldn't break. Basically make sure that it doesn't error out. Try running it with the file attached.

Jun 17 '24 13:06 dartpain

I am interested to work on this issue.

Oct 10 '24 21:10 jayantp2003

I was playing around with the zip file and couple of different files, I found that its not an issue related to chunking of code, there is some issue with RstParser class, I did update the file extensions to text file, for that case, it was working fine.

Currently checking the Rstparser class to figure out the changes required.

Oct 10 '24 23:10 jayantp2003

The issue is with the implementation of rst parser, in each file, it looks for a header and a text below it, but for the zip file we are testing on, it is just a single file with no header available, hence it is not being chunked. This header and text breakdown thing also seems to be an issue for markdown parser. The file should be chunked based on tokens or bytes and this tuple implementation also need to be updated.

Oct 11 '24 10:10 jayantp2003

Yeah seems like thats the issue, lets add another token size handler to it maybe?

Oct 11 '24 10:10 dartpain

Hey, I have updated the code and created a PR, can you review it and approve, I am new to open source contributions and do not know how it works, after making a PR. Open to feedbacks.

Oct 11 '24 11:10 jayantp2003

@dartpain Can you review my changes and provide feedback, and approve if implementation seems correct.

Oct 11 '24 17:10 jayantp2003

DocsGPT DocsGPT copied to clipboard

🐛 Bug Report: Large zip breaking stream endpoint

📜 Description

👟 Reproduction steps

👍 Expected behavior

👎 Actual Behavior with Screenshots

💻 Operating system

What browsers are you seeing the problem on?

🤖 What development environment are you experiencing this bug on?

🔒 Did you set the correct environment variables in the right path? List the environment variable names (not values please!)

📃 Provide any additional context for the Bug.

📖 Relevant log output

👀 Have you spent some time to check if this bug has been raised before?

🔗 Are you willing to submit PR?

🧑‍⚖️ Code of Conduct

DocsGPT
DocsGPT copied to clipboard