xp
Results
2
issues of
xp
12343
We used AWQ to quantize a model with the same architecture as LLaMA2. After quantization, the VRAM usage during loading was only 6567M, but the VRAM usage reached 32223M when...