grok-1 Inference memory usage issues

Inference memory usage issues

Open AlpinDale opened this issue 1 year ago • 1 comments

It seems as if the model is being loaded in FP16. I've also noticed how QuantizedWeight8bit is imported in run.py, but not actually used. Is that for runtime quantization with FP16 weights, or is it needed for the released 8bit weights but not actually utilized (due to an oversight?)?

Mar 17 '24 20:03 AlpinDale

i noticed it too trying to test it out

Mar 17 '24 21:03 drhamza123

grok-1 grok-1 copied to clipboard

Inference memory usage issues

grok-1
grok-1 copied to clipboard