inference icon indicating copy to clipboard operation
inference copied to clipboard

Enabling vllm in llama2-70b

Open jdwillard19 opened this issue 7 months ago • 3 comments

I am trying to implement the vllm flag in this reference implementation, however as I understand I have to manually spin up vllm outside of running the benchmark. However, the dockerfile does not include any vllm package installation. I was wondering which version of vllm was used to test this. Thanks!

jdwillard19 avatar May 28 '25 20:05 jdwillard19

We didn't use vLLM when creating Llama2-70B - feel free to use any version that works

nvzhihanj avatar May 30 '25 20:05 nvzhihanj

Thanks, I saw it is an available flag in the reference code but it doesn't seem to work. Maybe that should be removed?

jdwillard19 avatar Jun 02 '25 17:06 jdwillard19

vllm support via OpenAI compatible API was added by NeuralMagic. https://github.com/mlcommons/inference/blob/master/language/llama2-70b/SUT_API.py

arjunsuresh avatar Jun 02 '25 17:06 arjunsuresh