Haoning Wu
Haoning Wu
We have provided the download link summary of the dataset (meta-data.json) in the repo and provided the pipeline to obtain the corresponding text and mask. The dataset has already uploaded...
If you encounter problems when using environment.yaml, I suggest you install the key dependencies I recommend, including torch, accelerate, xformers, diffusers, and transformers. Since the diffusers library is updated and...
Sorry for the late reply, I was on vacation last week. You don't necessarily need to download all the videos in metadata.json, because they may be removed due to YouTube's...
Since the inpainting pipeline is totally borrowed from the implementations of Stable Diffusion, we did not include this part code in our repository, you can follow our README.md to download...
1. The data preprocessing script we provide can correctly construct the structure of the dataset, please refer to the code we provide. 2. We use 95% of the data for...
1. Of course, that works. 2. Since YouTube videos may be taken down, some videos might no longer be available. However, if you collected them based on the metadata we...
Well, since different people input stories in various ways (direct input or calling GPT to generate), we did not implement a function/interface for non-researchers. But you can divide the story...
Sorry for the late reply, I was on vacation last week. You can consider using the simple mode first, that is, taking one frame as the context condition. First, use...
Sorry for the late response, it seems that you are trying to generate with multiple reference characters, which is a more challenging problem. In the supplementary of our paper, we...
This is a single inference. The 10 generated images correspond to the results of 10 different random seeds (10 times independent sampling), which is not a coherent story. Besides, directly...