feat: Support prequantized fp8 ckpt for nemotron-mini-4b-instruct
This MR adds changes to support consumption of pre-quantized FP8 ckpt from ModelOpt for nemotron-mini-4b-instruct.
$ rm -rf nemotron_mini_4b_tllm_fp8_ckpt/ && python examples/gpt/convert_checkpoint.py --model_dir nemotron-mini-4b-instruct_vfp8-fp8-bf16-export/ --output_dir nemotron_mini_4b_tllm_fp8_ckpt/ --dtype bfloat16
$ rm -rf nemotron_mini_4b_tllm_fp8_eng/ && trtllm-build --checkpoint_dir nemotron_mini_4b_tllm_fp8_ckpt/ --output_dir nemotron_mini_4b_tllm_fp8_eng/
$ python examples/run.py --engine_dir nemotron_mini_4b_tllm_fp8_eng/ --max_output_len 20 --tokenizer_dir nemotron-mini-4b-instruct_vfp8-fp8-bf16-export/ --input_text "Hello, how are you?"
/bot run
PR_Github #371 [ run ] triggered by Bot
PR_Github #371 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #336 completed with status: 'FAILURE'
/bot run
PR_Github #390 [ run ] triggered by Bot
PR_Github #390 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #347 completed with status: 'SUCCESS'
/bot run
PR_Github #449 [ run ] triggered by Bot
PR_Github #449 [ run ] completed with state FAILURE
/LLM/main/L0_MergeRequest_PR pipeline #384 completed with status: 'FAILURE'
/bot run
PR_Github #457 [ run ] triggered by Bot
PR_Github #457 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #392 completed with status: 'SUCCESS'
/bot run
/bot run
PR_Github #808 [ run ] triggered by Bot
PR_Github #810 [ run ] triggered by Bot
PR_Github #808 [ run ] completed with state ABORTED
/bot run
PR_Github #828 [ run ] triggered by Bot
PR_Github #810 [ run ] completed with state ABORTED
PR_Github #828 [ run ] completed with state SUCCESS
/LLM/main/L0_MergeRequest_PR pipeline #669 completed with status: 'SUCCESS'
/bot reuse-pipeline --comment "The test is added in post-merge CI, locally verified the test passed"
PR_Github #866 [ reuse-pipeline ] triggered by Bot
PR_Github #866 [ reuse-pipeline ] completed with state SUCCESS
Reusing PR_Github #828 for commit 363e481