ViLT
ViLT copied to clipboard
inference image captioning
who can do a demo for my image-captioning of ViLT. pleaseee!@! I'm a newbie in NLP field <33
Hi @trucvip123,
Though ViLT has not undergone a captioning fine-tuning, you can emulate the captioning by passing text query as [MASK] [MASK] [MASK] ... [MASK] [MASK]
([MASK]
* your desired length
) to MLM demo.
Thank you @dandelin