gpt-fast Bandwidth achieved for INT8 is much smaller than FP16

Bandwidth achieved for INT8 is much smaller than FP16

Open yafehlis opened this issue 1 year ago • 3 comments

I run CodeLlama 7B: when I use FP16, bandwidth achieved is 700 GB/s; however, when I use INT8, it is 197 GB/s. I run the model on one AMD MI210 GPU. Why is bandwidth achieved lower using INT8? @kit1980 @msaroufim @yifuwang @huntzhan Thanks, Yao Fehlis (AMD)

Feb 06 '24 20:02 yafehlis

gpt-fast gpt-fast copied to clipboard

Bandwidth achieved for INT8 is much smaller than FP16

gpt-fast
gpt-fast copied to clipboard