Yao Fehlis issues

Repositories
Issues
Comments

Results 3 issues of


                                            Yao Fehlis

INT4 quantization not working on MI210

INT8 quantization works fine, but INT4 does not work. ![Capture](https://github.com/pytorch-labs/gpt-fast/assets/106262476/ac10df53-860e-4da9-b51e-1ad17e3fe3c4)

Bandwidth achieved for INT8 is much smaller than FP16

I run CodeLlama 7B: when I use FP16, bandwidth achieved is 700 GB/s; however, when I use INT8, it is 197 GB/s. I run the model on one AMD MI210...

Code is extremely slow!

I am using AMD MI210s. After loading the models, the following steps are extremely slow (see screenshot). It turned out the Compilation time is 270 seconds. Could you please help...