ParlAI
ParlAI copied to clipboard
Script to render Images and captions
Is your feature request related to a problem? Please describe. Related to a sub-task of #2021. It would be useful to have a script to generate images and their captions. Something of this sort has already been done for chit-chat in #2035 (Updates in #2059) and can be used for reference.
Describe the solution you'd like The high level idea is the same as for conversation rendering:
- Generate an HTML string to do this
- Use headless chrome to get a PDF
- Use headless chrome to get png
Additional context @klshuster Could you provide a fixed format to process the data like there was in convo_render ? Also a rough vision of how you would want it to look like would be great. An idea would be: A polaroid like box in the center of the screen (grey/white background) with the caption in messenger blue below the image.
Instead of a new script, I suggest you simply add it to the rendering you already wrote, and ingest the image
part of the Message
if it's present. You can use <img src="data:image/png;base64, iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg=="/>
to include the image directly into the HTML, where that final jibberish is the base64 encoding of the bytes from the PNG file.
I suspect I'm missing some complications that Kurt knows about.
I see, I just saw on the other thread the plans to make it a new script. I still disagree but yield to Kurt if he stands by the choice. Apologies for the misunderstanding.
i think there are two ways to view this issue - that is, whether we are rending images within OR outside the context of chit-chat/conversation. As the most immediate use case involves rendering images within chit-chat, I'll agree with Stephen that we can include, in the current script, rendering an image if it is present in the Message
object. Something to keep in mind is that a unique image can span multiple Message
s and we'd only want to render this once.
So my initial view of this was that it was unrelated to a chit-chat conversation. I was thinking that we have a bunch of images and our model is predicting the caption. In that case we just need the image and caption and hence a very different HTML which is why I was considering a different script. Could you maybe provide an example of image captioning in the message format you are talking about ? Even a rough sketch drawn by hand would do.
Also what would be the format of the data being processed ?
A good example would be the image_chat
dataset, or any models trained on such a task. The message format would look something like this; more information about that dataset/models are here: https://parl.ai/projects/image_chat/.
For visual context, imagine a conversation on e.g. Facebook Messenger where someone sends an image via chat (and a thumbnail shows up), and the other person responds.
It's worth noting that parley frames everything as chats anyway :D
Hi,
Is there any update for this? I know FB's visdom repo could be a starting point.
However, in my initial experiments with it, I didn't have aligned images and text. See this issue.
I ended up using jupyter notebooks only :D Any suggestions?
Thanks.
We'd happily welcome a PR. We currently have other priorities and probably won't come back to this task for some time.
Sure. If I end up implementing something, I would definitely raise a PR.
For now, I have this PR for downloading image_chat
data easily.
Hey @shubhamagarwal92, If you would like to work on this, it might be a good idea to have a look at #2035 (Updates in #2059) as a starting point since I had implemented a similar thing but just for text. Might be useful to integrate this into the previous implementation (saving redundant code). Feel free to improve on the previous implementation if necessary!
This issue has not had activity in 30 days. Marking as stale.