feat: Add vLLM V1 support w/Unsloth model service
Migrate the Unsloth model service to also support vLLM V1 which has some performance improvements and is the future of vLLM development.
There are a few current limitations with Unsloth Zoo that disallow V1 support. Generally, Unsloth Zoo does not support V1's collective RPC pattern yet. The collective RPC call to get the weight IPC handles failed with CUDA error: invalid argument. Also, the collective RPC calls do not check if the results are coroutines and so fail when called from AsyncLLM instances.
I'm not seeing any chatter on the Unsloth side about working towards this. How hard would it be to do it ourselves?
Hard to say, could take a while.
Probably will end up closing this if decoupling vLLM & Unsloth works out