unilm
unilm copied to clipboard
More functions in demo
what an exciting job! However, the functions displayed in online demo or local-hosted demo are the same. Only images can be input, and the model provides boxes and caption.But, the paper mentions many functions, such as inputting the corresponding box to generate captions. When will these functions be released?
@ErrorMelody Thank you for the attention! We will release it in the future.
You also can unlock it with several changes:
Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!
@ErrorMelody Thank you for the attention! We will release it in the future.
You also can unlock it with several changes:
Hi, may I know what is the argument in step1 gr.Text()? Could you please share it to us?
@ErrorMelody Thank you for the attention! We will release it in the future.
You also can unlock it with several changes:
Hi, @pengzhiliang. can you describe how to change the code for "Grounded question answering"?
@ErrorMelody Thank you for the attention! We will release it in the future.
You also can unlock it with several changes:
1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component. 2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348). 3. host it. 4. enable the sampling, and then enjoy it!
I have implemented this function in my own folk https://github.com/sheldonchiu/unilm
Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!
I have make a small tool to easily test this function (https://sheldonchiu.github.io/kosmos2-prompt-tool/)
Below are some quick demo:
- Create a bounding box using my tool
- Embed the output in your prompt
Example 2:
The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!
@ErrorMelody Thank you for the attention! We will release it in the future. You also can unlock it with several changes:
1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component. 2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348). 3. host it. 4. enable the sampling, and then enjoy it!I have implemented this function in my own folk https://github.com/sheldonchiu/unilm
Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!
I have make a small tool to easily test this function (https://sheldonchiu.github.io/)
Below are some quick demo:
- Create a bounding box using my tool
- Embed the output in your prompt
Example 2:
The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!
Hello, may I ask if your demo can extract the aspect in the sentence?
@ErrorMelody Thank you for the attention! We will release it in the future. You also can unlock it with several changes:
1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component. 2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348). 3. host it. 4. enable the sampling, and then enjoy it!I have implemented this function in my own folk https://github.com/sheldonchiu/unilm
Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!
I have make a small tool to easily test this function (https://sheldonchiu.github.io/)
Below are some quick demo:
- Create a bounding box using my tool
- Embed the output in your prompt
Example 2:
The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!
@sheldonchiu The prompt tool is super useful!


