localGPT icon indicating copy to clipboard operation
localGPT copied to clipboard

multi-user or async prompt requests crashes the app

Open srikanthmalla opened this issue 1 year ago • 6 comments

Hi, I tried the UI and when multiple users send a prompt at the same time, the app crashes. Initially I thought it was an issue with flask and tried waitress (based on WSGI production warning when running the UI app). Even then the problem persisted. Now I am thinking it could be the langchain usage in this localgpt api app can't handle async requests. Please let me know if this is correct. If so, are there any suggestions?

Thank you, Srikanth

srikanthmalla avatar Jan 16 '24 18:01 srikanthmalla

You will need to create a queue in the api. To manage incoming requests. So u need to wait for a request to complete then serve the next request. If u have multiple GPUs you will need to load the model once on each gpu. And then add those into your queue.

LeafmanZ avatar Jan 22 '24 04:01 LeafmanZ

Hi, Do you have example on the code on how to do it for the suggestion that you give ?

fenry46 avatar Jan 23 '24 03:01 fenry46

API_queue.docx I am sharing the code adjustments to support the API queue

KerenK-EXRM avatar Jan 30 '24 23:01 KerenK-EXRM

@Keren-Data-Scientist-Tranning can you create a PR for this. would love to add this functionality.

PromtEngineer avatar Feb 01 '24 00:02 PromtEngineer

sure.

KerenK-EXRM avatar Feb 01 '24 16:02 KerenK-EXRM

@PromtEngineer PR Added for API queue

KerenK-EXRM avatar Feb 03 '24 14:02 KerenK-EXRM