Yao Fehlis
Yao Fehlis
INT8 quantization works fine, but INT4 does not work. 
I run CodeLlama 7B: when I use FP16, bandwidth achieved is 700 GB/s; however, when I use INT8, it is 197 GB/s. I run the model on one AMD MI210...
I am using AMD MI210s. After loading the models, the following steps are extremely slow (see screenshot). It turned out the Compilation time is 270 seconds. Could you please help...