Yuchao Zhang comments

Results 116 comments of


                                            Yuchao Zhang

support for llama 3

llama3 should already be supported with template https://github.com/npuichigo/openai_trtllm/blob/main/templates/history_template_llama3.liquid. To get the model, please refer to https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#llama-v3-updates

support for llama 3

it's ensemble if the structure looks like https://github.com/triton-inference-server/tensorrtllm_backend/tree/v0.9.0/all_models/inflight_batcher_llm

ERROR: expected number of inputs between 1 and 3 but got 9 inputs for model

it's not planned yet, but I think it's trivial to adapt the codes for your use case.

ERROR: expected number of inputs between 1 and 3 but got 9 inputs for model

Can you provide how to calling vllm-based triton backend? The grpc interface, the parameters for example to call the service.

[codellama] There is no space between each words

could u set RUST_LOG to debug and attach the debug info here? https://github.com/npuichigo/openai_trtllm/blob/8e33ce19cac8b9803ce525b88475585d670fe01b/src/routes/chat.rs#L100

[codellama] There is no space between each words

I tested with codellama and it indeed has no space between words. ``` $ python openai_completion.py class SimpleTransformer(nn.Module): @classmethod def add_args(cls, parser): return parser @classmethod def from_args(cls, args): $ python...

[codellama] There is no space between each words

@charllll The inflight_batcher_llm_client only calls the `tensorrt_llm` model from triton instead of the `ensemble` model and does [de-tokenization](https://github.com/triton-inference-server/tensorrtllm_backend/blob/da59830baf762a2026c10535ac6459d0cb45e990/inflight_batcher_llm/client/inflight_batcher_llm_client.py#L826) itself. It seems be related to this https://github.com/triton-inference-server/tensorrtllm_backend/issues/332

Yuchao Zhang

support for llama 3

support for llama 3

ERROR: expected number of inputs between 1 and 3 but got 9 inputs for model

ERROR: expected number of inputs between 1 and 3 but got 9 inputs for model

[codellama] There is no space between each words

[codellama] There is no space between each words

[codellama] There is no space between each words

Connection Timeout

No white space included in tokens sent back by Llama2 in streaming mode

Add usage in response like openai?