Sam Stoelinga
Sam Stoelinga
Update: I spoke too early. It works for some configs, but having issues with xformers and missing flashinfer. Still working on fixing that. I was able to get it to...
I got 0.8.2 working with flash-infer as well: ``` substratusai/vllm-gh200:v0.8.2 ``` Example docker runs: ``` docker run --runtime nvidia --gpus all \ -v ~/.cache/huggingface:/root/.cache/huggingface \ -p 8000:8000 \ --ipc=host -e...
I build a new image and did basic tests: `substratusai/vllm-gh200:v0.8.3` Please give it a try.