Q-GaLore
Q-GaLore copied to clipboard
[suggestion] how about training using q5_k or q6_k quantization?
I wonder how fast would be to train a model from scratch using f16 for output and embed tensors and q5_k or q6_k for the other tensors.
My quants of huggingface use this technique and they are less degraded.
https://huggingface.co/spaces/RobertSinclair/README