localGPT multi-user or async prompt requests crashes the app

multi-user or async prompt requests crashes the app

Open srikanthmalla opened this issue 1 year ago • 6 comments

Hi, I tried the UI and when multiple users send a prompt at the same time, the app crashes. Initially I thought it was an issue with flask and tried waitress (based on WSGI production warning when running the UI app). Even then the problem persisted. Now I am thinking it could be the langchain usage in this localgpt api app can't handle async requests. Please let me know if this is correct. If so, are there any suggestions?

Thank you, Srikanth

Jan 16 '24 18:01 srikanthmalla

You will need to create a queue in the api. To manage incoming requests. So u need to wait for a request to complete then serve the next request. If u have multiple GPUs you will need to load the model once on each gpu. And then add those into your queue.