Enabling vllm in llama2-70b
I am trying to implement the vllm flag in this reference implementation, however as I understand I have to manually spin up vllm outside of running the benchmark. However, the dockerfile does not include any vllm package installation. I was wondering which version of vllm was used to test this. Thanks!
We didn't use vLLM when creating Llama2-70B - feel free to use any version that works
Thanks, I saw it is an available flag in the reference code but it doesn't seem to work. Maybe that should be removed?
vllm support via OpenAI compatible API was added by NeuralMagic.
https://github.com/mlcommons/inference/blob/master/language/llama2-70b/SUT_API.py