ART icon indicating copy to clipboard operation
ART copied to clipboard

feat: Add vLLM V1 support w/Unsloth model service

Open bradhilton opened this issue 8 months ago • 4 comments

Migrate the Unsloth model service to also support vLLM V1 which has some performance improvements and is the future of vLLM development.

bradhilton avatar Jul 01 '25 23:07 bradhilton

There are a few current limitations with Unsloth Zoo that disallow V1 support. Generally, Unsloth Zoo does not support V1's collective RPC pattern yet. The collective RPC call to get the weight IPC handles failed with CUDA error: invalid argument. Also, the collective RPC calls do not check if the results are coroutines and so fail when called from AsyncLLM instances.

bradhilton avatar Jul 01 '25 23:07 bradhilton

I'm not seeing any chatter on the Unsloth side about working towards this. How hard would it be to do it ourselves?

corbt avatar Jul 02 '25 13:07 corbt

Hard to say, could take a while.

bradhilton avatar Jul 02 '25 20:07 bradhilton

Probably will end up closing this if decoupling vLLM & Unsloth works out

bradhilton avatar Jul 12 '25 22:07 bradhilton