mPLUG
mPLUG copied to clipboard
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)
The [code](https://github.com/X-PLUG/mPLUG/blob/c666bfa1044bde5a6ce47fa1b4ae22d7bf9de633/caption_mplug_scst.py#L84-L86) in this repo shows that baseline reward is calculated by averaging reward of generated captions. However, the [original version of scst](https://github.com/ruotianluo/self-critical.pytorch) as well as some other scst implementation...
Hi. I don't understand how to use this pre-trained model for image captioning. Am I supposed to clone the github repo and then somehow load the pre-trained model? It would...
and how to find it? thanks
Hi authors, Thanks for your great work. Are there any chance for you to release the fintuned COCO/Flikr30k checkpoint for image-retrieval task? Thanks a lot.
请问在modelsope中公布的模型是论文中SOTA的模型吗,我这边使用模型介绍中的模型和caption输出方法,在coco-val 5k上统计bleu4和cider指标,都是低于论文指标的。 是模型的原因、测试方法不对或者是测试集测试工具没对齐吗?
Dear authors, I finetuned mPLUG Base on VQAv2 but only get around 75% accuracy instead of the around 80% reported in the readme. Could you kindly upload the finetuned checkpoints...
你好,请问pretrain什么时候开放源码,看到readme里写的 coming soon,想在自己的数据上试一下效果。感谢
Hi, thanks for your work! I used your model with modelscope, but I didn't find which model size is the default setting in modelscope, I only know is its named...