ParlAI icon indicating copy to clipboard operation
ParlAI copied to clipboard

Script to render Images and captions

Open vedantpuri opened this issue 4 years ago • 11 comments

Is your feature request related to a problem? Please describe. Related to a sub-task of #2021. It would be useful to have a script to generate images and their captions. Something of this sort has already been done for chit-chat in #2035 (Updates in #2059) and can be used for reference.

Describe the solution you'd like The high level idea is the same as for conversation rendering:

  • Generate an HTML string to do this
  • Use headless chrome to get a PDF
  • Use headless chrome to get png

Additional context @klshuster Could you provide a fixed format to process the data like there was in convo_render ? Also a rough vision of how you would want it to look like would be great. An idea would be: A polaroid like box in the center of the screen (grey/white background) with the caption in messenger blue below the image.

vedantpuri avatar Oct 07 '19 23:10 vedantpuri

Instead of a new script, I suggest you simply add it to the rendering you already wrote, and ingest the image part of the Message if it's present. You can use <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="/> to include the image directly into the HTML, where that final jibberish is the base64 encoding of the bytes from the PNG file.

I suspect I'm missing some complications that Kurt knows about.

stephenroller avatar Oct 08 '19 03:10 stephenroller

I see, I just saw on the other thread the plans to make it a new script. I still disagree but yield to Kurt if he stands by the choice. Apologies for the misunderstanding.

stephenroller avatar Oct 08 '19 03:10 stephenroller

i think there are two ways to view this issue - that is, whether we are rending images within OR outside the context of chit-chat/conversation. As the most immediate use case involves rendering images within chit-chat, I'll agree with Stephen that we can include, in the current script, rendering an image if it is present in the Message object. Something to keep in mind is that a unique image can span multiple Messages and we'd only want to render this once.

klshuster avatar Oct 08 '19 17:10 klshuster

So my initial view of this was that it was unrelated to a chit-chat conversation. I was thinking that we have a bunch of images and our model is predicting the caption. In that case we just need the image and caption and hence a very different HTML which is why I was considering a different script. Could you maybe provide an example of image captioning in the message format you are talking about ? Even a rough sketch drawn by hand would do.

Also what would be the format of the data being processed ?

vedantpuri avatar Oct 08 '19 19:10 vedantpuri

A good example would be the image_chat dataset, or any models trained on such a task. The message format would look something like this; more information about that dataset/models are here: https://parl.ai/projects/image_chat/.

For visual context, imagine a conversation on e.g. Facebook Messenger where someone sends an image via chat (and a thumbnail shows up), and the other person responds.

klshuster avatar Oct 08 '19 20:10 klshuster

It's worth noting that parley frames everything as chats anyway :D

stephenroller avatar Oct 08 '19 22:10 stephenroller

Hi,

Is there any update for this? I know FB's visdom repo could be a starting point.

However, in my initial experiments with it, I didn't have aligned images and text. See this issue.

I ended up using jupyter notebooks only :D Any suggestions?

Thanks.

shubhamagarwal92 avatar Feb 03 '20 18:02 shubhamagarwal92

We'd happily welcome a PR. We currently have other priorities and probably won't come back to this task for some time.

stephenroller avatar Feb 03 '20 22:02 stephenroller

Sure. If I end up implementing something, I would definitely raise a PR.

For now, I have this PR for downloading image_chat data easily.

shubhamagarwal92 avatar Feb 04 '20 15:02 shubhamagarwal92

Hey @shubhamagarwal92, If you would like to work on this, it might be a good idea to have a look at #2035 (Updates in #2059) as a starting point since I had implemented a similar thing but just for text. Might be useful to integrate this into the previous implementation (saving redundant code). Feel free to improve on the previous implementation if necessary!

vedantpuri avatar Feb 04 '20 21:02 vedantpuri

This issue has not had activity in 30 days. Marking as stale.

github-actions[bot] avatar Jun 02 '20 00:06 github-actions[bot]