lightseq
lightseq copied to clipboard
Can int8 in pre-training large model ???
Hello guys! I would like to know if you have experimented with int8 precision in the pre-training of your large models. Can int8 replace fp16 and fp32 to achieve faster training speeds? Are there any relevant case studies or experiments?