Vincenzo di Cicco comments

Results 5 comments of


                                            Vincenzo di Cicco

Unable to `finetune/lora.py` with `DDP`

Sure, here are the steps to reproduce on a fresh lit-gpt clone: ```bash # download & convert model python scripts/download.py --repo_id TinyLlama/TinyLlama-1.1B-intermediate-step-955k-token-2T python scripts/convert_hf_checkpoint.py --checkpoint checkpoints/TinyLlama/TinyLlama-1.1B-intermediate-step-955k-token-2T/ # prepare (tiny) train...

Mixtral with TP hangs indefinitely if another process uses the same GPU with ONNX

Disabling `use_custom_all_reduce` solved the issue, many thanks! Does this means there is a bug in the custom all_reduce plugin, and if so: do you think this will be fixed? Furthermore,...

Mixtral with TP hangs indefinitely if another process uses the same GPU with ONNX

Hi @nv-guomingz disabling `use_custom_all_reduce` solved the reported issue with TRT-LLM 0.11.0 so this specific issue can be closed. As a last question: I saw that in recent versions of TRT-LLM...

`-1` token id with Mixtral FP8 and tensorrt_llm 0.11.0

@vonchenplus thanks for the answer. In my case I'm not using a _very_ low temperature, it is `0.1`

`-1` token id with Mixtral FP8 and tensorrt_llm 0.11.0

@PerkzZheng @Tracin thanks! I can confirm that disabling `use_fp8_context_fmha` solves the issue, If we will do tests on llama 7b I will update here. Regarding the decrease in quality of...