server
server copied to clipboard
Bump vllm to v0.4.2
Hi @kebe7jun, thanks for submitting the PR.
Can you elaborate on the specific models or features you're interested in that require this version upgrade?
CC @oandreeva-nv @tanmayv25
Started internal CI: 14897042 @kebe7jun , by any chance, have you submitted Triton CLA already?
I need LLama3 optimizations from 0.4.1, as well as phi-3-mini support from 0.4.2. See: https://github.com/vllm-project/vllm/releases
I have already signed the CLA, and I have had PR merged before.
Started internal CI: 14897042 ... tritonclient.utils.InferenceServerException: [StatusCode.INVALID_ARGUMENT] load failed for model 'vllm_opt': version 1 is at UNAVAILABLE state: Internal: AttributeError: module 'pynvml' has no attribute 'nvmlDeviceGetP2PStatus'
Looks like this might need a different version of pynvml or something. CC @oandreeva-nv @pskiran1 @tanmayv25
@rmccorm4 , yesy, I was working on this. In latest vllm they've added installation of pynvml to the requirements, so it conflicts with whatever we re-install in multi-gpu tests and as a result fails it. I was doing some initial tests with removing pynvml from tests, but it still was failing. Didn't get a chance to debug further as got re-assigned
vLLM backend PR: https://github.com/triton-inference-server/vllm_backend/pull/43 Latest internal CI: 15397809
@kebe7jun, please rebase your branch with the latest main. Thank you.
@pskiran1 , this branch has no conflicts, thus re-base is unnecessary IMO. Feel free to merge this PR with yours