xp

Results 2 issues of xp

12343

We used AWQ to quantize a model with the same architecture as LLaMA2. After quantization, the VRAM usage during loading was only 6567M, but the VRAM usage reached 32223M when...