Yao Fehlis

Results 3 issues of Yao Fehlis

INT8 quantization works fine, but INT4 does not work. ![Capture](https://github.com/pytorch-labs/gpt-fast/assets/106262476/ac10df53-860e-4da9-b51e-1ad17e3fe3c4)

I run CodeLlama 7B: when I use FP16, bandwidth achieved is 700 GB/s; however, when I use INT8, it is 197 GB/s. I run the model on one AMD MI210...

I am using AMD MI210s. After loading the models, the following steps are extremely slow (see screenshot). It turned out the Compilation time is 270 seconds. Could you please help...