TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

feat: Support prequantized fp8 ckpt for nemotron-mini-4b-instruct

Open brb-nv opened this issue 9 months ago • 12 comments

This MR adds changes to support consumption of pre-quantized FP8 ckpt from ModelOpt for nemotron-mini-4b-instruct.

$ rm -rf nemotron_mini_4b_tllm_fp8_ckpt/ && python examples/gpt/convert_checkpoint.py --model_dir nemotron-mini-4b-instruct_vfp8-fp8-bf16-export/ --output_dir nemotron_mini_4b_tllm_fp8_ckpt/ --dtype bfloat16
$ rm -rf nemotron_mini_4b_tllm_fp8_eng/ && trtllm-build --checkpoint_dir nemotron_mini_4b_tllm_fp8_ckpt/ --output_dir nemotron_mini_4b_tllm_fp8_eng/
$ python examples/run.py --engine_dir nemotron_mini_4b_tllm_fp8_eng/ --max_output_len 20 --tokenizer_dir nemotron-mini-4b-instruct_vfp8-fp8-bf16-export/ --input_text "Hello, how are you?"

brb-nv avatar Mar 24 '25 22:03 brb-nv

/bot run

brb-nv avatar Mar 25 '25 04:03 brb-nv

PR_Github #371 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 04:03 niukuo

PR_Github #371 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #336 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 06:03 niukuo

/bot run

brb-nv avatar Mar 25 '25 07:03 brb-nv

PR_Github #390 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 07:03 niukuo

PR_Github #390 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #347 completed with status: 'SUCCESS'

niukuo avatar Mar 25 '25 10:03 niukuo

/bot run

brb-nv avatar Mar 25 '25 15:03 brb-nv

PR_Github #449 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 15:03 niukuo

PR_Github #449 [ run ] completed with state FAILURE /LLM/main/L0_MergeRequest_PR pipeline #384 completed with status: 'FAILURE'

niukuo avatar Mar 25 '25 17:03 niukuo

/bot run

brb-nv avatar Mar 25 '25 17:03 brb-nv

PR_Github #457 [ run ] triggered by Bot

niukuo avatar Mar 25 '25 18:03 niukuo

PR_Github #457 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #392 completed with status: 'SUCCESS'

niukuo avatar Mar 25 '25 21:03 niukuo

/bot run

brb-nv avatar Apr 01 '25 00:04 brb-nv

/bot run

brb-nv avatar Apr 01 '25 00:04 brb-nv

PR_Github #808 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 01 '25 00:04 tensorrt-cicd

PR_Github #810 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 01 '25 00:04 tensorrt-cicd

PR_Github #808 [ run ] completed with state ABORTED

tensorrt-cicd avatar Apr 01 '25 00:04 tensorrt-cicd

/bot run

brb-nv avatar Apr 01 '25 02:04 brb-nv

PR_Github #828 [ run ] triggered by Bot

tensorrt-cicd avatar Apr 01 '25 02:04 tensorrt-cicd

PR_Github #810 [ run ] completed with state ABORTED

tensorrt-cicd avatar Apr 01 '25 02:04 tensorrt-cicd

PR_Github #828 [ run ] completed with state SUCCESS /LLM/main/L0_MergeRequest_PR pipeline #669 completed with status: 'SUCCESS'

tensorrt-cicd avatar Apr 01 '25 04:04 tensorrt-cicd

/bot reuse-pipeline --comment "The test is added in post-merge CI, locally verified the test passed"

syuoni avatar Apr 01 '25 06:04 syuoni

PR_Github #866 [ reuse-pipeline ] triggered by Bot

tensorrt-cicd avatar Apr 01 '25 06:04 tensorrt-cicd

PR_Github #866 [ reuse-pipeline ] completed with state SUCCESS Reusing PR_Github #828 for commit 363e481

tensorrt-cicd avatar Apr 01 '25 06:04 tensorrt-cicd