grok-1 icon indicating copy to clipboard operation
grok-1 copied to clipboard

Inference memory usage issues

Open AlpinDale opened this issue 1 year ago • 1 comments

It seems as if the model is being loaded in FP16. I've also noticed how QuantizedWeight8bit is imported in run.py, but not actually used. Is that for runtime quantization with FP16 weights, or is it needed for the released 8bit weights but not actually utilized (due to an oversight?)?

AlpinDale avatar Mar 17 '24 20:03 AlpinDale

i noticed it too trying to test it out

drhamza123 avatar Mar 17 '24 21:03 drhamza123