TensorRT-LLM icon indicating copy to clipboard operation
TensorRT-LLM copied to clipboard

Why the flops num is higher than standard specification ?

Open YiandLi opened this issue 10 months ago • 2 comments

System Info

H20 * 1

Who can help?

No response

Information

  • [ ] The official example scripts
  • [ ] My own modified scripts

Tasks

  • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • [ ] My own task or dataset (give details below)

Reproduction

python3 convert_checkpoint.py --model_dir /TensorRT-LLM/Llama-2-7b-hf
--output_dir /TensorRT-LLM/examples/llama/tllm_checkpoint_1gpu_fp16
--dtype float16

trtllm-build --checkpoint_dir /TensorRT-LLM/examples/llama/tllm_checkpoint_1gpu_fp16
--output_dir /TensorRT-LLM/examples/llama/tmp/llama/7B/trt_engines/fp16/1-gpu
--gemm_plugin float16
--max_input_len=4086
--max_output_len=4086
--max_batch_size 16

image What does fp8 TFLOPS and bfloat16 TFLOPS mean ? does it mean the total FLOPS during my building process, instead of Flops per second ?

Expected behavior

actual behavior

additional notes

YiandLi avatar Apr 26 '24 09:04 YiandLi

Could you share what branch do you use? I don't see such info in latest TRT LLM.

byshiue avatar Apr 30 '24 03:04 byshiue

It should be the latest main branch .

YiandLi avatar May 07 '24 03:05 YiandLi

These are numbers of the hardware spec.

byshiue avatar May 09 '24 03:05 byshiue