GPTQ-triton Needs more VRAM than normal GPTQ CUDA version?

Needs more VRAM than normal GPTQ CUDA version?

Open DanielWe2 opened this issue 1 year ago • 3 comments

Thanks, I wanted to try your triton version. But I only have 8 GB RAM.

The GPTQ Cuda versions works (7B model). Your version (the ppl script) crashes with CUDA OOM).

Is that to be expected or can that be solved?

Mar 28 '23 19:03 DanielWe2