unilm More functions in demo

what an exciting job! However, the functions displayed in online demo or local-hosted demo are the same. Only images can be input, and the model provides boxes and caption.But, the paper mentions many functions, such as inputting the corresponding box to generate captions. When will these functions be released?

Jul 03 '23 07:07 ErrorMelody

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

change the gr.Radio component into gr.Text component.
inputs = f"[image]{user_image_path}{text_input}" in here.
host it.
enable the sampling, and then enjoy it!

Jul 03 '23 11:07 pengzhiliang

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

Jul 06 '23 08:07 mu-cai

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

change the gr.Radio component into gr.Text component.

inputs = f"[image]{user_image_path}{text_input}" in here.

host it.

enable the sampling, and then enjoy it!

Hi, may I know what is the argument in step1 gr.Text()? Could you please share it to us?

Jul 07 '23 01:07 wanghao-cst

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

change the gr.Radio component into gr.Text component.

inputs = f"[image]{user_image_path}{text_input}" in here.

host it.

enable the sampling, and then enjoy it!

Hi, @pengzhiliang. can you describe how to change the code for "Grounded question answering"?

Jul 10 '23 06:07 BIGBALLON

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component.

2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348).

3. host it.

4. enable the sampling, and then enjoy it!

I have implemented this function in my own folk https://github.com/sheldonchiu/unilm

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

I have make a small tool to easily test this function (https://sheldonchiu.github.io/kosmos2-prompt-tool/)

Below are some quick demo:

Create a bounding box using my tool
Embed the output in your prompt

Example 2:

demo3

The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!

Jul 15 '23 03:07 sheldonxxxx

@ErrorMelody Thank you for the attention! We will release it in the future. You also can unlock it with several changes:
1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component.

2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348).

3. host it.

4. enable the sampling, and then enjoy it!
I have implemented this function in my own folk https://github.com/sheldonchiu/unilm

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

I have make a small tool to easily test this function (https://sheldonchiu.github.io/)

Below are some quick demo:

Create a bounding box using my tool

Embed the output in your prompt

Example 2:

The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!

Hello, may I ask if your demo can extract the aspect in the sentence?

Aug 04 '23 11:08 APPLE-XMT

@ErrorMelody Thank you for the attention! We will release it in the future. You also can unlock it with several changes:
1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component.

2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348).

3. host it.

4. enable the sampling, and then enjoy it!
I have implemented this function in my own folk https://github.com/sheldonchiu/unilm

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

I have make a small tool to easily test this function (https://sheldonchiu.github.io/)

Below are some quick demo:

Create a bounding box using my tool

Embed the output in your prompt

Example 2:

The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!

@sheldonchiu The prompt tool is super useful!

Aug 08 '23 03:08 donglixp

unilm unilm copied to clipboard

More functions in demo

unilm
unilm copied to clipboard