ray-llm
ray-llm copied to clipboard
Queue-Worker System
Thank you for the great package. I'm interested in hosting an LLM on GKE.
For our existing ML applications, we usually implement a queue-worker system (e.g. redis-queue or redis-celery) to handle long-running background tasks. Does ray-llm have a similar feature implemented under-the-hood? Or do I need to set it up myself?
Hi @AIApprentice101, We don't have the functionality in the ray-llm, you have to set it up by yourself.
For redis solution, do you see any issue or pain points? Or it is more about the integration effort.
@sihanwang41 Thank you for your reply. I saw there's a RFC related to integration of queuing system in Ray serve: https://github.com/ray-project/ray/issues/32292. So I was wondering if that's something Ray-LLM would consider, especially given the inference of LLM usually takes pretty long to run.
In the meantime, we can set up the queuing system ourselves.