GPTQ-triton
GPTQ-triton copied to clipboard
Needs more VRAM than normal GPTQ CUDA version?
Thanks, I wanted to try your triton version. But I only have 8 GB RAM.
The GPTQ Cuda versions works (7B model). Your version (the ppl script) crashes with CUDA OOM).
Is that to be expected or can that be solved?