gpt-fast
gpt-fast copied to clipboard
Bandwidth achieved for INT8 is much smaller than FP16
I run CodeLlama 7B: when I use FP16, bandwidth achieved is 700 GB/s; however, when I use INT8, it is 197 GB/s. I run the model on one AMD MI210 GPU. Why is bandwidth achieved lower using INT8? @kit1980 @msaroufim @yifuwang @huntzhan Thanks, Yao Fehlis (AMD)