unilm
unilm copied to clipboard
[kosmos-2] Is there a plan to release code for more tasks of kosmos-2?
Thanks for the amazing work [kosmos-2]. From the paper, kosmos-2 can do lots of tasks: (1) visual grounding, (2)-(3) grounded question answering, (4)-(6) multimodal referring via bounding boxes, and (7) grounded image captioning.
Is there any plan to release a demo for these tasks?
Thanks! Those all are in our release schedule. If you can't wait to experience it, please refer to here.
@pengzhiliang thanks a lot !!!
We have released evaluation code of some tasks in here.
We have released evaluation code of some tasks in here.
Thanks for your hard work. Could you continue to release the remaining evaluation code? It would be a valuable contribution to the field.
We have released evaluation code of some tasks in here.
Thanks for your hard work. Could you continue to release the remaining evaluation code? It would be a valuable contribution to the field.
@JierunChen Which tasks you would like to evaluate first? So we could prioritize the release pipeline.
@pengzhiliang FYI.
@donglixp @pengzhiliang I do not have a strict preference and you may prioritize at your best convenience. The evaluation code is requested for the following tasks:
- Referring expression generation
- Image captioning
- Visual question answering
Grateful for your contributions.