QQQ OOM

Open LugerW-A opened this issue 5 months ago • 1 comments

https://github.com/HandH1998/QQQ/blob/e307d9f00b90309069733890f28eefe9886bb6f2/QQQ/gptq/models/llama.py#L56C5-L58C36

To avoid an Out-of-Memory (OOM) error when quantizing with default commands, which activate all data on a single GPU, can we optimize this process in several ways？

Jul 10 '25 07:07 LugerW-A

And Using CPU needs 400G Mem for a 20B model, also too slow..

Jul 10 '25 07:07 LugerW-A