QQQ
QQQ copied to clipboard
QQQ OOM
https://github.com/HandH1998/QQQ/blob/e307d9f00b90309069733890f28eefe9886bb6f2/QQQ/gptq/models/llama.py#L56C5-L58C36
To avoid an Out-of-Memory (OOM) error when quantizing with default commands, which activate all data on a single GPU, can we optimize this process in several ways?
And Using CPU needs 400G Mem for a 20B model, also too slow..