Aaron Pham

Results 420 comments of Aaron Pham

@marijnbent what is your batch size and requests configuration?

Hey can you try again? I think this should be fixed by now.

Do you have 6 GPUS? I will check this

gptq is now supported with vLLM and latest openllm version. You can test it with vLLM as I haven't update the pytorch code path for a while now. You should...

cc @XunchaoZ might worth taking a look into this.

hmm, can you try with vllm backend if you have GPU?

Hi there, thanks for creating the issue. Do you have vllm available locally?

I will take a look into detokenization incrementally for PyTorch backend.