BitNet (b1.58) support
First of all, thanks. We need more ramps.
I was curious what you think of BitNet, and if llm.c is a place where experimenting with it could be facilitated. The papers were extremely promising and got a lot of traction, but there while there have been a few (small scale) reproductions yet, there isn't a easy ramp to start experimenting with it.
Papers
- BitNet: Scaling 1-bit Transformers for Large Language Models
- (BitNet b1.58)) The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
- The Era of 1-bit LLMs: Training Tips, Code and FAQ
I don't think we have it on the current roadmap, Andrej can chime in. We have a lot of stuff on the backlog before we get here, including potentially supporting fp8, ZeRO stage 2, etc.
The problem with BitNet (b1.58) training is that is still uses FP16/BF16 for training so the memory consumption does not decrease. Anyways getting support for it would be great! If used with FP8 training it could bring improvement.