Aaron Pham comments

Results 420 comments of


                                            Aaron Pham

bug: 40GB is not enough for llama-2-7b

@marijnbent what is your batch size and requests configuration?

bug: 40GB is not enough for llama-2-7b

Hey can you try again? I think this should be fixed by now.

bug: 40GB is not enough for llama-2-7b

maybe try upgrading vllm

bug: Failed to run on specified gpus

Do you have 6 GPUS? I will check this

Inference Speed comparison

gptq is now supported with vLLM and latest openllm version. You can test it with vLLM as I haven't update the pytorch code path for a while now. You should...

feat: Function calling for OpenAI Compatible Chat Completion API

cc @XunchaoZ might worth taking a look into this.

bug: Unexpected token generation under /generate_stream (stream)

hmm, can you try with vllm backend if you have GPU?

bug: Output text from CompletionChunk is different with tokenizer.decode

Hi there, thanks for creating the issue. Do you have vllm available locally?

bug: Output text from CompletionChunk is different with tokenizer.decode

Sounds like a orthogonal issue from OpenLLM?

bug: Output text from CompletionChunk is different with tokenizer.decode

I will take a look into detokenization incrementally for PyTorch backend.