vc384

Results 2 comments of vc384

the loss is normalizd by sample number, you should multiple lr by 4 to get the same coverage speed when you set batch size 256. refer this paper for detail...

I also meet nccl time out error when finetuning internvl3.0