Tyson comments

Results 9 comments of


                                            Tyson

gradient explosion in TAPT DAPT pretraining

Hi Danshi, When using RecAdam in TAPT, you can also set different "anneal_t0" and "anneal_k", because RecAdam optimizer is very sensitive to these two parameters. In our experiments, as reported...

preprocess_sent_label.py此文件运行错误

Hi, Thank you for your interest in our paper. From the error message. It seems you didn't install the rouge metric correctly. The rouge we use here is based on...

What is the final ranking of your model in the LaySumm competition?

We ranked first place in the competition. And it is proceeding in the workshop.

Number of Finetuning Steps for TAPT/DAPT/SDPT

Hi, for the TAPT and DAPT, we pre-trained the model for 10 epochs. For the SDPT, we pre-trained the model on CNN dataset for 780000 steps.

Number of Finetuning Steps for TAPT/DAPT/SDPT

Yes, it takes a long time to train DAPT, but it's not that long as you said. Do you try with gradient accumulation? We do gradient accumulation for every 10...

Number of Finetuning Steps for TAPT/DAPT/SDPT

Yes, I am also using a GTX-1080 Ti and I think there is no huge gap between my training time and yours. It's not necessary to finish the 10 epochs...

how get the "noise image feature"?

Maybe you can search on google with "How to create random value tensor in PyTorch?"

`image_len` is not uesd?

image_len=None, means the default value is None, you can pass a int list wiht batch size to this function

`image_len` is not uesd?

I see. The image_len is not used in the multimodal fusion function. You can put this as a mask in the cross-attention. Probably it can improve the performance slightly.