ray-llm icon indicating copy to clipboard operation
ray-llm copied to clipboard

Update to latest vLLM upstream and Support vLLM on CPU

Open xwu99 opened this issue 10 months ago • 9 comments

  • Update models to pydantic v2 as latest vllm has adopted v2 models instead of v1
  • Fix AutoscalingConfig model as it's from Ray Serve that is based on pydantic v1
  • Add CPU model yaml files for Llama2 7B

xwu99 avatar Apr 23 '24 07:04 xwu99

were you able to run this locally? does it work? I am just looking forward to see how to update this project to support latest vllm

XBeg9 avatar Apr 24 '24 17:04 XBeg9

were you able to run this locally? does it work? I am just looking forward to see how to update this project to support latest vllm

I am working on this. Several packages have been updated (ray, vllm, pydantic, openai etc.) since the last release of RayLLM. Hopefully to get it working soon.

xwu99 avatar Apr 25 '24 00:04 xwu99

Hey all,

I also have similar updates on a fork however I've struggled to get feedback from the maintainers to work out how to proceed here. I similiarly updated rayllm to pydantic v2 due to the vllm migration to v2 proper (not using the v1 back-compat). The challenge this introduced is that it makes these changes incompatible with ray because ray is still using v1 compat. See: https://github.com/ray-project/ray/issues/43908 (I haven't had a chance to go back and get further specifics as requested to help convince the core ray team to reconsider the pydantic upgrade)

There's numerous other signature changes with the tight coupling of ray and vllm so whilst you may get rayllm working directly with vllm, I wonder what the mileage will be here on getting this contribution accepted if it excludes ray support.

Just food for thought. :)

lynkz-matt-psaltis avatar Apr 28 '24 07:04 lynkz-matt-psaltis

Hey all,

I also have similar updates on a fork however I've struggled to get feedback from the maintainers to work out how to proceed here. I similiarly updated rayllm to pydantic v2 due to the vllm migration to v2 proper (not using the v1 back-compat). The challenge this introduced is that it makes these changes incompatible with ray because ray is still using v1 compat. See: ray-project/ray#43908 (I haven't had a chance to go back and get further specifics as requested to help convince the core ray team to reconsider the pydantic upgrade)

There's numerous other signature changes with the tight coupling of ray and vllm so whilst you may get rayllm working directly with vllm, I wonder what the mileage will be here on getting this contribution accepted if it excludes ray support.

Just food for thought. :)

No need for ray to upgrade. I just upgrade AutoscalingConfig to v2 here. Previously I used pydantic.v1 but found latest fastapi has issues supporting pydantic.v1.

xwu99 avatar Apr 28 '24 07:04 xwu99

@xwu99 comment says vllm is installed seperately from source for now but I don't see anywhere it being installed?

marov avatar May 04 '24 01:05 marov

@xwu99 comment says vllm is installed seperately from source for now but I don't see anywhere it being installed?

You just need to follow vLLM official guide.

xwu99 avatar May 05 '24 09:05 xwu99

@xwu99 I saw worker_use_ray=False, is that means your implements cannot support model parallelization? I mean world_size > 1?

depenglee1707 avatar May 06 '24 01:05 depenglee1707

@xwu99 I saw worker_use_ray=False, is that means your implements cannot support model parallelization? I mean world_size > 1?

vLLM for CPU does not support tensor parallelism yet. This PR should be revised later to support both CPU and GPU. Right now it's just adapted for CPU.

xwu99 avatar May 06 '24 01:05 xwu99

@xwu99 I saw worker_use_ray=False, is that means your implements cannot support model parallelization? I mean world_size > 1?

vLLM for CPU does not support tensor parallelism yet. This PR should be revised later to support both CPU and GPU. Right now it's just adapted for CPU.

Great, thanks for clarification. I also try to upgrade vllm to latest version but for GPU, I found it's not a easy work. the main problem is vllm require driver process also has GPU capability

depenglee1707 avatar May 06 '24 01:05 depenglee1707