moonlitt

Results 1 comments of moonlitt

Hi, there are many reasons: 1. Our model is pre-trained on weak semantic correlation data crawled from the web while ViLT is pre-trained on strong semantic correlation data. Flickr30K is...