EETQ
EETQ copied to clipboard
Quantization takes a very long time
Using TGI or Lorax eetq quantization takes several minutes (Eg 10 minutes for Mixtral) every time the launcher is run .
As a reference bitsandbytes nf4 quant takes 1 minute.
Is there any way to store and directly load the eetq model?