vc384
Results
2
comments of
vc384
the loss is normalizd by sample number, you should multiple lr by 4 to get the same coverage speed when you set batch size 256. refer this paper for detail...
I also meet nccl time out error when finetuning internvl3.0