xp issues

Repositories
Issues
Comments

Results 2 issues of

xp

test

12343

Memory increases significantly during inference

We used AWQ to quantize a model with the same architecture as LLaMA2. After quantization, the VRAM usage during loading was only 6567M, but the VRAM usage reached 32223M when...