juney-nvidia comments

Results 117 comments of


                                            juney-nvidia

Provide an interface similar to OpenAI API

@Pevernow Can you elaborate more about your request? Thanks June

Provide an interface similar to OpenAI API

Sorry for replying late due to being trapped by other things. > Users want something like this https://github.com/lm-sys/FastChat/blob/main/docs/openai_api.md, so they can switch their apps from OpenAI models to TRT-LLM models...

Support for Zephyr 7B model

Thanks for your suggestion. Let me add it to the list of models that were requested and we will keep you posted. Juney

Issue reproducing tokenized input for BERT

@Muhtasham can you share the concrete command sequence to reproduce the issue? Including how you build the engine. Thanks June

NVIDIA AMMO documentation

@RalphMao do you have any comments on this ask? :)

Support SDXL and its distributed inference

@Zars19 thanks for the contribution to TensorRT-LLM! @nv-guomingz can you help take care of this? :) Thanks June

Use first bad_words as extra parameters, and implement min-p

@pathorn Hi Pathorn Thanks for your interest to submit the MR into TRT-LLM. The current process of merging community MR into TRT-LLM is: - After the contributor finishing the implementation...

[Bug]When I use tensorrt_llm_bls, the first token takes very long time.

@wjj19950828 Hi, can you follow [this](https://github.com/triton-inference-server/tensorrtllm_backend/issues/270) template to provide the concrete steps to reproduce your issue? Then our engineers can help with the investigation. June

Start Triton failed to load libtriton_tensorrtllm on aarch64.

@matichon-vultureprime Thanks for reporting this. Currently the ARM support of TRT-LLM is still at experimental phase, so it may contain issues. When the ARM support is stable enough, we will...

test: [TRTLLM-4334] Create 1.0 criteria scope from API stability references

Thanks for preparing the MR, @syuoni ! June