feat: Add vLLM V1 support w/Unsloth model service

Open bradhilton opened this issue 8 months ago • 4 comments

Migrate the Unsloth model service to also support vLLM V1 which has some performance improvements and is the future of vLLM development.

Jul 01 '25 23:07 bradhilton

There are a few current limitations with Unsloth Zoo that disallow V1 support. Generally, Unsloth Zoo does not support V1's collective RPC pattern yet. The collective RPC call to get the weight IPC handles failed with CUDA error: invalid argument. Also, the collective RPC calls do not check if the results are coroutines and so fail when called from AsyncLLM instances.

Jul 01 '25 23:07 bradhilton

I'm not seeing any chatter on the Unsloth side about working towards this. How hard would it be to do it ourselves?

Jul 02 '25 13:07 corbt

Hard to say, could take a while.

Jul 02 '25 20:07 bradhilton

Probably will end up closing this if decoupling vLLM & Unsloth works out

Jul 12 '25 22:07 bradhilton