server Bump vllm to v0.4.2

May 09 '24 09:05 kebe7jun

Hi @kebe7jun, thanks for submitting the PR.

Can you elaborate on the specific models or features you're interested in that require this version upgrade?

CC @oandreeva-nv @tanmayv25

May 09 '24 23:05 rmccorm4

Started internal CI: 14897042 @kebe7jun , by any chance, have you submitted Triton CLA already?

May 09 '24 23:05 oandreeva-nv

I need LLama3 optimizations from 0.4.1, as well as phi-3-mini support from 0.4.2. See: https://github.com/vllm-project/vllm/releases

I have already signed the CLA, and I have had PR merged before.

May 10 '24 01:05 kebe7jun

Started internal CI: 14897042 ... tritonclient.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] load failed for model 'vllm_opt': version 1 is at UNAVAILABLE state: Internal: AttributeError: module 'pynvml' has no attribute 'nvmlDeviceGetP2PStatus'

Looks like this might need a different version of pynvml or something. CC @oandreeva-nv @pskiran1 @tanmayv25

May 21 '24 18:05 rmccorm4

@rmccorm4 , yesy, I was working on this. In latest vllm they've added installation of pynvml to the requirements, so it conflicts with whatever we re-install in multi-gpu tests and as a result fails it. I was doing some initial tests with removing pynvml from tests, but it still was failing. Didn't get a chance to debug further as got re-assigned

May 21 '24 19:05 oandreeva-nv

vLLM backend PR: https://github.com/triton-inference-server/vllm_backend/pull/43 Latest internal CI: 15397809

May 29 '24 17:05 pskiran1

@kebe7jun, please rebase your branch with the latest main. Thank you.

May 29 '24 17:05 pskiran1

@pskiran1 , this branch has no conflicts, thus re-base is unnecessary IMO. Feel free to merge this PR with yours

May 29 '24 17:05 oandreeva-nv

server server copied to clipboard

Bump vllm to v0.4.2

server
server copied to clipboard