unilm icon indicating copy to clipboard operation
unilm copied to clipboard

More functions in demo

Open ErrorMelody opened this issue 2 years ago • 4 comments

what an exciting job! However, the functions displayed in online demo or local-hosted demo are the same. Only images can be input, and the model provides boxes and caption.But, the paper mentions many functions, such as inputting the corresponding box to generate captions. When will these functions be released?

ErrorMelody avatar Jul 03 '23 07:07 ErrorMelody

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

  1. change the gr.Radio component into gr.Text component.
  2. inputs = f"[image]{user_image_path}{text_input}" in here.
  3. host it.
  4. enable the sampling, and then enjoy it!

pengzhiliang avatar Jul 03 '23 11:07 pengzhiliang

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

mu-cai avatar Jul 06 '23 08:07 mu-cai

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

  1. change the gr.Radio component into gr.Text component.
  2. inputs = f"[image]{user_image_path}{text_input}" in here.
  3. host it.
  4. enable the sampling, and then enjoy it!

Hi, may I know what is the argument in step1 gr.Text()? Could you please share it to us?

wanghao-cst avatar Jul 07 '23 01:07 wanghao-cst

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

  1. change the gr.Radio component into gr.Text component.
  2. inputs = f"[image]{user_image_path}{text_input}" in here.
  3. host it.
  4. enable the sampling, and then enjoy it!

Hi, @pengzhiliang. can you describe how to change the code for "Grounded question answering"?

BIGBALLON avatar Jul 10 '23 06:07 BIGBALLON

@ErrorMelody Thank you for the attention! We will release it in the future.

You also can unlock it with several changes:

1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component.

2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348).

3. host it.

4. enable the sampling, and then enjoy it!

I have implemented this function in my own folk https://github.com/sheldonchiu/unilm

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

I have make a small tool to easily test this function (https://sheldonchiu.github.io/kosmos2-prompt-tool/)

Below are some quick demo:

  1. Create a bounding box using my tool demo2
  2. Embed the output in your prompt demo1

Example 2:

demo3

The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!

sheldonxxxx avatar Jul 15 '23 03:07 sheldonxxxx

@ErrorMelody Thank you for the attention! We will release it in the future. You also can unlock it with several changes:

1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component.

2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348).

3. host it.

4. enable the sampling, and then enjoy it!

I have implemented this function in my own folk https://github.com/sheldonchiu/unilm

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

I have make a small tool to easily test this function (https://sheldonchiu.github.io/)

Below are some quick demo:

  1. Create a bounding box using my tool demo2
  2. Embed the output in your prompt demo1

Example 2:

demo3

The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!

Hello, may I ask if your demo can extract the aspect in the sentence?

APPLE-XMT avatar Aug 04 '23 11:08 APPLE-XMT

@ErrorMelody Thank you for the attention! We will release it in the future. You also can unlock it with several changes:

1. change the [gr.Radio](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L490) component into gr.Text component.

2. inputs = f"[image]{user_image_path}{text_input}" in [here](https://github.com/microsoft/unilm/blob/874dfed8008ecf6bfc077e161b3fdced8c4fbf8c/kosmos-2/demo/gradio_app.py#L348).

3. host it.

4. enable the sampling, and then enjoy it!

I have implemented this function in my own folk https://github.com/sheldonchiu/unilm

Can you specifically talk about how to put the bounding box, i.e., grounding tokens into the text prompt in demo/test script? Can not wait to see the object description capability. Thank you!

I have make a small tool to easily test this function (https://sheldonchiu.github.io/)

Below are some quick demo:

  1. Create a bounding box using my tool demo2
  2. Embed the output in your prompt demo1

Example 2:

demo3

The accuracy of a response is profoundly influenced by the phrasing of the question. With a well-crafted prompt, the results can be astounding. Thanks for releasing this great model!

@sheldonchiu The prompt tool is super useful!

donglixp avatar Aug 08 '23 03:08 donglixp