EETQ icon indicating copy to clipboard operation
EETQ copied to clipboard

Quantization takes a very long time

Open timohear opened this issue 5 months ago • 3 comments

Using TGI or Lorax eetq quantization takes several minutes (Eg 10 minutes for Mixtral) every time the launcher is run .

As a reference bitsandbytes nf4 quant takes 1 minute.

Is there any way to store and directly load the eetq model?

timohear avatar Feb 03 '24 11:02 timohear