sukima
sukima copied to clipboard
Use isolated workers for heavy operations (e.g. inference)
The app becomes completely unresponsive when processing requests that involve transformers. Using a queueing system like Celery that offloads such heavy tasks onto separate workers will greatly improve the end-user experience and also make the webapp much more scalable. I recommend using RabbitMQ as the broker and Redis as the backend for Celery, since those seem to be the most widely used.
it looks like the GPTHF module is also regenerating logits for every request, and the behavior of the app suggests that it is multi-threaded... those may contribute to the issue of the app becoming completely unresponsive