ohwi comments

Repositories
Issues
Comments

Results 3 comments of


                                            ohwi

About the model explanation

I saw little difference at the backbone. The paper uses ViT and this work uses CNN.

About the model explanation

Thank you for your reply. I think I understand the structure of your work. Thank you!!

About the model explanation

> PS: Recent research shows that doing "Object Detection" prior to "Image Captioning" doesn't bring any additional improvement, instead it will just increase complexity. Hi. Would you let me know...