TensorRT-LLM Why the flops num is higher than standard specification ?

Why the flops num is higher than standard specification ?

Open YiandLi opened this issue 10 months ago • 2 comments

System Info

H20 * 1

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

python3 convert_checkpoint.py --model_dir /TensorRT-LLM/Llama-2-7b-hf
--output_dir /TensorRT-LLM/examples/llama/tllm_checkpoint_1gpu_fp16
--dtype float16

trtllm-build --checkpoint_dir /TensorRT-LLM/examples/llama/tllm_checkpoint_1gpu_fp16
--output_dir /TensorRT-LLM/examples/llama/tmp/llama/7B/trt_engines/fp16/1-gpu
--gemm_plugin float16
--max_input_len=4086
--max_output_len=4086
--max_batch_size 16

What does fp8 TFLOPS and bfloat16 TFLOPS mean ? does it mean the total FLOPS during my building process, instead of Flops per second ?

Expected behavior

actual behavior

additional notes

Apr 26 '24 09:04 YiandLi

Could you share what branch do you use? I don't see such info in latest TRT LLM.

Apr 30 '24 03:04 byshiue

It should be the latest main branch .

May 07 '24 03:05 YiandLi

These are numbers of the hardware spec.

May 09 '24 03:05 byshiue

TensorRT-LLM TensorRT-LLM copied to clipboard

Why the flops num is higher than standard specification ?

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

TensorRT-LLM
TensorRT-LLM copied to clipboard