Kaiyu Xie
Kaiyu Xie
> > Hi, > > I cannot understand your meanings, please provide more details. > > There is no training record(eg. loss) during training my own data set. During visualization,...
@MustafaFayez Not sure which version did you use, but if you're using v0.7.1, `--max_batch_size` is 8 by default. (see [here](https://github.com/NVIDIA/TensorRT-LLM/blob/v0.7.1/examples/llama/build.py#L167)) I saw that you did not specify `--max_batch_size`, while trying...
Hi @Bhuvanesh09 , thanks very much for your great work. The update including your changes have been merged into the main branch, (see #1168) and we've credit you as the...
@Coder-nlper Please share your commands to build the engines and benchmarks so that we can check if the comparison is apple-to-apple. Thanks.
Hi @CoderHam , the changes are integrated in https://github.com/NVIDIA/TensorRT-LLM/pull/1688 and we've credited you as co-author, hence I'm closing this PR now, thanks a lot
@siddhatiwari The fix has been updated in PR https://github.com/NVIDIA/TensorRT-LLM/pull/1639, please verify again with the latest main branch. Thanks!
`gptManagerBenchmark` does not support specifying sampling strategy yet, and it's using default `top_p` and `top_k`, which is `top_p=0.0` and `top_k=1`.
1. Scripts in `python` is benchmarking the Python runtime of TensorRT-LLM, while `cpp` includes scripts to benchmark the C++ runtime, which includes support to benchmark static batching and inflight batching...
@sleepwalker2017 Sorry, the documents under benchmark directory for LoRA is outdated, we will fix them. Please refer to the documents [here](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/llama#run-llama-with-lora) and try the commands there, which should be updated.
> @QiJune I noticed that this change did not land in TRT-LLM 0.9.0 release tag. Can you provide an ETA ? Hi @Lokiiiiii , thanks a lot for your contribution...