Aaron Pham
Aaron Pham
@marijnbent what is your batch size and requests configuration?
Hey can you try again? I think this should be fixed by now.
maybe try upgrading vllm
Do you have 6 GPUS? I will check this
gptq is now supported with vLLM and latest openllm version. You can test it with vLLM as I haven't update the pytorch code path for a while now. You should...
cc @XunchaoZ might worth taking a look into this.
hmm, can you try with vllm backend if you have GPU?
Hi there, thanks for creating the issue. Do you have vllm available locally?
Sounds like a orthogonal issue from OpenLLM?
I will take a look into detokenization incrementally for PyTorch backend.