llll

Results 5 comments of llll

Hello, I would like to know which step of inference should be taken for attention weight, and which stage of attention weight should be taken when generating each word?Thanks

I tried get attention weights from the last decoder's cross-attention's last head,maybe you can try it. @not-hermione

Thank you very much! I use ruotian luo's code [ImageCaptioning.pytorch(https://github.com/ruotianluo/ImageCaptioning.pytorch) and use swin-transformer instade of bottom-up feature when train, and it can run about 9G memory for SCST. But i...

Thanks a lot! I will try your advice for training.Thank you very much for your patience again!

Dear Author! Sorry to bother you! I have tried your suggestion and used swin-transformer to extract image features, but it got 2-3 CIDER points lower than use image just in...