brb-nv
brb-nv
This MR adds changes to support consumption of pre-quantized FP8 ckpt from ModelOpt for nemotron-mini-4b-instruct. ``` $ rm -rf nemotron_mini_4b_tllm_fp8_ckpt/ && python examples/gpt/convert_checkpoint.py --model_dir nemotron-mini-4b-instruct_vfp8-fp8-bf16-export/ --output_dir nemotron_mini_4b_tllm_fp8_ckpt/ --dtype bfloat16 $...
This MR adds support for Phi-4-mini and Phi-4-multimodal models.
This MR adds unit tests to validate Eagle support for models with untrained Eagle heads. These are meant to be sanity tests which will find blatant issues such as missing...
This MR adds tests validating FP8, LoRA, Medusa support for following models: 1) Codestral-22B-v0.1 2) Ministral-8B-Instruct-2410 3) Mistral-Small-24B-Base-2501
## Description This MR integrates helix parallelism, an experimental feature, in TRTLLM. **_Background:_** - Helix parallelism is a decode-only context parallelism method. Hence, it's used in disaggregated setting where only...