Curya comments

Results 33 comments of


                                            Curya

custom datasets

The core generation logic (how to generate all necessary files under mscoco folder) is located below these code cells of the snapshot image. I have not saved the pre-processing codes...

The "dataset_coco. json" file is the Karpathy split annotation file of MSCOCO Captioning, it is just the re-organization of MSCOCO raw JSON annotation. Maybe you need to refer to https://github.com/karpathy/neuraltalk...

Swin Transformer pre-trained Model?

1. pre-trained model: The latter. We adopted the Swin-L 1k model (which have 384x384 of input size and 12 of window size) of "ImageNet-22K pre-trained models" in https://github.com/microsoft/Swin-Transformer/blob/main/get_started.md. Actually, we...

Swin Transformer pre-trained Model?

The difference must exist, but whether huge needs experimental verification. It seems that SwinTransformer did not release the Swin-L model pre-trained on Regular ImageNet-1K. I am running a simple experiment...

Swin Transformer pre-trained Model?

The result is bad, even worse than using Bottom-Up region features under XE loss, so I didn't continue to train it under SCST. Abnormal! I guess maybe there were some...

question about CUDA memory for SCST

It seems about 4000M+ memory under Tesla V100 32G GPU if I remember correctly. You can store the image feats of training set to reduce the CUDA memory usage.

question about CUDA memory for SCST

Sorry I am not sure why. If you just train the model using pre-extracted Swin feats, the CUDA memory should not be too high. And I remember wrong, 4000M+ is...

question about score

Hi, 122.5 CIDEr score after the XE training? Actually, it is normal, you can continue the SCST training for higher metrics. [log.txt](https://github.com/232525/PureT/files/9590845/log.txt) ``` [INFO: 2021-08-23 01:52:14,715] ######## Epoch (VAL)16 ########...

question about score

Yes, the original `Noam` lr scheduler can get about 122 CIDEr score after XE training. If adopting `Cosine` lr scheduler may get higher CIDEr. Actually, I think the comparison of...

About the variance of 0.016

Thanks for your reply, I calc the average result on val set, and it seems normal. But something confused me: you mentioned $l_\infty$ norm in your paper, but it seems...