grok-1
grok-1 copied to clipboard
Inference memory usage issues
It seems as if the model is being loaded in FP16. I've also noticed how QuantizedWeight8bit is imported in run.py, but not actually used. Is that for runtime quantization with FP16 weights, or is it needed for the released 8bit weights but not actually utilized (due to an oversight?)?
i noticed it too trying to test it out