Kaiyu Xie comments

Results 71 comments of


                                            Kaiyu Xie

Train with own dataset.

> > Hi, > > I cannot understand your meanings, please provide more details. > > There is no training record(eg. loss) during training my own data set. During visualization,...

Unable to run benchmark of mixtral

@MustafaFayez Not sure which version did you use, but if you're using v0.7.1, `--max_batch_size` is 8 by default. (see [here](https://github.com/NVIDIA/TensorRT-LLM/blob/v0.7.1/examples/llama/build.py#L167)) I saw that you did not specify `--max_batch_size`, while trying...

Adding distil-whisper model support to TensorRT-LLM

Hi @Bhuvanesh09 , thanks very much for your great work. The update including your changes have been merged into the main branch, (see #1168) and we've credit you as the...

a compare with vllm 0.2.7

@Coder-nlper Please share your commands to build the engines and benchmarks so that we can check if the comparison is apple-to-apple. Thanks.

[fix] export failure with CUDA driver < 526 and pynvml>=11.5.0

Hi @CoderHam , the changes are integrated in https://github.com/NVIDIA/TensorRT-LLM/pull/1688 and we've credited you as co-author, hence I'm closing this PR now, thanks a lot

use_fp8_context_fmha broken outputs

@siddhatiwari The fix has been updated in PR https://github.com/NVIDIA/TensorRT-LLM/pull/1639, please verify again with the latest main branch. Thanks!

How to set `top_p` 'top_k' arguments in `gptManagerBenchmark`?

`gptManagerBenchmark` does not support specifying sampling strategy yet, and it's using default `top_p` and `top_k`, which is `top_p=0.0` and `top_k=1`.

What is the recommended way to do benchmark

1. Scripts in `python` is benchmarking the Python runtime of TensorRT-LLM, while `cpp` includes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching...

What is the recommended way to do benchmark

@sleepwalker2017 Sorry, the documents under benchmark directory for LoRA is outdated, we will fix them. Please refer to the documents [here](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#run-llama-with-lora) and try the commands there, which should be updated.

Adding debug options to trtllm-build to visualize the TRT Network before Engine build

> @QiJune I noticed that this change did not land in TRT-LLM 0.9.0 release tag. Can you provide an ETA ? Hi @Lokiiiiii , thanks a lot for your contribution...