Curya
Curya
The core generation logic (how to generate all necessary files under mscoco folder) is located below these code cells of the snapshot image. I have not saved the pre-processing codes...
The "dataset_coco. json" file is the Karpathy split annotation file of MSCOCO Captioning, it is just the re-organization of MSCOCO raw JSON annotation. Maybe you need to refer to https://github.com/karpathy/neuraltalk...
1. pre-trained model: The latter. We adopted the Swin-L 1k model (which have 384x384 of input size and 12 of window size) of "ImageNet-22K pre-trained models" in https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md. Actually, we...
The difference must exist, but whether huge needs experimental verification. It seems that SwinTransformer did not release the Swin-L model pre-trained on Regular ImageNet-1K. I am running a simple experiment...
The result is bad, even worse than using Bottom-Up region features under XE loss, so I didn't continue to train it under SCST. Abnormal! I guess maybe there were some...
It seems about 4000M+ memory under Tesla V100 32G GPU if I remember correctly. You can store the image feats of training set to reduce the CUDA memory usage.
Sorry I am not sure why. If you just train the model using pre-extracted Swin feats, the CUDA memory should not be too high. And I remember wrong, 4000M+ is...
Hi, 122.5 CIDEr score after the XE training? Actually, it is normal, you can continue the SCST training for higher metrics. [log.txt](https://github.com/232525/PureT/files/9590845/log.txt) ``` [INFO: 2021-08-23 01:52:14,715] ######## Epoch (VAL)16 ########...
Yes, the original `Noam` lr scheduler can get about 122 CIDEr score after XE training. If adopting `Cosine` lr scheduler may get higher CIDEr. Actually, I think the comparison of...
Thanks for your reply, I calc the average result on val set, and it seems normal. But something confused me: you mentioned $l_\infty$ norm in your paper, but it seems...