Tyson
Tyson
Hi Danshi, When using RecAdam in TAPT, you can also set different "anneal_t0" and "anneal_k", because RecAdam optimizer is very sensitive to these two parameters. In our experiments, as reported...
Hi, Thank you for your interest in our paper. From the error message. It seems you didn't install the rouge metric correctly. The rouge we use here is based on...
We ranked first place in the competition. And it is proceeding in the workshop.
Hi, for the TAPT and DAPT, we pre-trained the model for 10 epochs. For the SDPT, we pre-trained the model on CNN dataset for 780000 steps.
Yes, it takes a long time to train DAPT, but it's not that long as you said. Do you try with gradient accumulation? We do gradient accumulation for every 10...
Yes, I am also using a GTX-1080 Ti and I think there is no huge gap between my training time and yours. It's not necessary to finish the 10 epochs...
Maybe you can search on google with "How to create random value tensor in PyTorch?"
image_len=None, means the default value is None, you can pass a int list wiht batch size to this function
I see. The image_len is not used in the multimodal fusion function. You can put this as a mask in the cross-attention. Probably it can improve the performance slightly.