OFA
OFA copied to clipboard
Config Mismatch between the provided script and pretrained model
Hello, I had a look at the configs in the pretrained OFA-large model, and found a couple of differences. For example:
- lr = 0.00005 instead of 0.0001 in the script nor 0.0002 in the paper.
- max_update = 150K, instead of 500K in the paper nor 50 epochs in the script.
- patch_image_size = 480 instead of 384 in the paper/script.
- sample_patch_num = -1 instead of 196 in the script.
How was the large model being trained? Is the global_batch_size=batch_size * #GPUs * update_freq = 2048 for all models?
Thanks!
@zhaoweicai Please refer to Appendix A.2 in our paper. For OFA-Large, we first train it with images of the resolution of 384 × 384 (sample 196 patches), and continue pretraining with images of the resolution of 480 × 480.
Oh, I was reading the old version of the paper. But still some details are missing for the two-stage training of OFA-Large. I can find the configs of the 2nd stage from the model, but what are parameters for the 1st stage, e.g. LR and max_update? And the datasets are exactly the same for those two stages?