BitNet
BitNet copied to clipboard
The resources for QAT
According to the paper, it is mentioned that QAT must start from scratch. Should I understand that performing QAT on 70B models requires as much time and resources as full precision training from scratch?