StoryGen icon indicating copy to clipboard operation
StoryGen copied to clipboard

What the inference results mean?

Open cc13qq opened this issue 1 year ago • 4 comments

Thanks for your code and checkpoints. I tried to run inference.py and expected to generate a story with a series of image-text pairs. But I got this: Screenshot 2024-09-03 172755

I don't know what the results mean. Are they a series of images that compose a story? If so, where are the descriptions of these images?

cc13qq avatar Sep 03 '24 21:09 cc13qq

The input prompt and image is: prompt = "The white dog is singing" prev_p = ["The white dog"] ref_image = ["./data/image/00001.png"]

cc13qq avatar Sep 03 '24 21:09 cc13qq

This is a single inference. The 10 generated images correspond to the results of 10 different random seeds (10 times independent sampling), which is not a coherent story. Besides, directly using real images as condition may exist domain gaps with our story-style checkpoint, which will somehow impact the generation quality.

haoningwu3639 avatar Sep 04 '24 02:09 haoningwu3639

I see. So, how do you make a coherent story composed of several images and texts, just like what you did in the paper?

cc13qq avatar Sep 04 '24 13:09 cc13qq

You can start by generating the first frame of the story in single-frame mode (set stage = ‘no’). Then, use the generated frame as the ref_image to generate the next frame of the story, iteratively producing a coherent and complete story in an autoregressive manner.

haoningwu3639 avatar Sep 06 '24 05:09 haoningwu3639