TensorRT-LLM
TensorRT-LLM copied to clipboard
Why the flops num is higher than standard specification ?
System Info
H20 * 1
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
python3 convert_checkpoint.py --model_dir /TensorRT-LLM/Llama-2-7b-hf
--output_dir /TensorRT-LLM/examples/llama/tllm_checkpoint_1gpu_fp16
--dtype float16
trtllm-build --checkpoint_dir /TensorRT-LLM/examples/llama/tllm_checkpoint_1gpu_fp16
--output_dir /TensorRT-LLM/examples/llama/tmp/llama/7B/trt_engines/fp16/1-gpu
--gemm_plugin float16
--max_input_len=4086
--max_output_len=4086
--max_batch_size 16
What does
fp8 TFLOPS
and bfloat16 TFLOPS
mean ? does it mean the total FLOPS during my building process, instead of Flops per second ?
Expected behavior
actual behavior
additional notes
Could you share what branch do you use? I don't see such info in latest TRT LLM.
It should be the latest main
branch .
These are numbers of the hardware spec.