Junyang Lin
Junyang Lin
Wait wait... Aren't you using `train_caption_stage1_base.sh` but instead `train_caption_stage1.sh`? I think that is because of the script. The arch of `train_caption_stage1.sh` is `ofa_large`, and thus you can't load a base...
Try gradient accumulation with `--update-freq`
`device_map='auto'` will automatically enables your model to run on multiple GPUs. If you would like to use only 1 GPU, you can set `device` or set the environment variable like...
Being frozen is quite necessary. I may prefer that people first finish the setup first, and then run the whole task (now it seems that everything is still a single...
Sorry, we do not have the permission.
Create a json file for your label set dictionary, or use the one I just uploaded.
修改配置文件或者代码都可以实现
这个可能和python版本有关系,你要不吧这行注释掉,然后随便设个term width,比如80
schesamp refers to schedule sampling and schedule refers to the schedule for learning rate decay
都是随机初始化,没有预训练