localGPT
localGPT copied to clipboard
multi-user or async prompt requests crashes the app
Hi, I tried the UI and when multiple users send a prompt at the same time, the app crashes. Initially I thought it was an issue with flask and tried waitress (based on WSGI production warning when running the UI app). Even then the problem persisted. Now I am thinking it could be the langchain usage in this localgpt api app can't handle async requests. Please let me know if this is correct. If so, are there any suggestions?
Thank you, Srikanth
You will need to create a queue in the api. To manage incoming requests. So u need to wait for a request to complete then serve the next request. If u have multiple GPUs you will need to load the model once on each gpu. And then add those into your queue.
Hi, Do you have example on the code on how to do it for the suggestion that you give ?
API_queue.docx I am sharing the code adjustments to support the API queue
@Keren-Data-Scientist-Tranning can you create a PR for this. would love to add this functionality.
sure.
@PromtEngineer PR Added for API queue