Haoning Wu comments

Results 22 comments of


                                            Haoning Wu

What the inference results mean?

You can start by generating the first frame of the story in single-frame mode (set stage = ‘no’). Then, use the generated frame as the ref_image to generate the next...

About checkpoints and stable-diffusion-v1-5

1. Since our model is designed based on SDM, all SDM pre-trained parameters need to be used; 2. Please refer to the official implementation repository of yolov7: https://github.com/WongKinYiu/yolov7 3. We...

Selection of reference images

Thank you for your question. I will address it based on the following points: 1. **Input Transformation**: Please refer to the guidelines in previous issues (https://github.com/haoningwu3639/StoryGen/issues/10#issuecomment-2002906594 and https://github.com/haoningwu3639/StoryGen/issues/14#issuecomment-2021797561). You need...

Testset

The dataset we released contains narration and descriptions generated by TextBind, which can be used directly. We also tried MiniGPT-v2. To be honest, it generates better descriptions, but it does...

inference setting

Please refer to our paper for the distinction between **narrative text** and **descriptive text**. Considering that descriptive text is more suitable as prompts for text-to-image models, we transformed the stories...

environment issue

Sorry for the confusion. When exporting the environment, I included all the libraries I commonly use. However, for the StoryGen project, Detectron2 is not a necessary dependency and can be...

environment issue

Sorry for the late reply. For the questions you raised, I have the following suggestions: 1. First of all, introducing more and higher quality data will help the generation effect;...

Upgrade to FLUX.1 ?

Indeed, the performance of Flux.1-dev and Flux.1-schnell is far superior to Stable Diffusion 1.5 and SDXL. Recently, I have also been working on building a code framework for fine-tuning Flux...

Comparison to ARLDM

Since it's been a long time since I trained and tested these baselines, maybe I've forgotten some of the details. However, I remember that we adopted the official AR-LDM code...

无法获得推理结果。

建议检查一下GPU的利用率和显存占用率以及CPU使用率，并且先确认一下torch是否在正常使用GPU？如果model被load到了CPU上，有可能会一直以极慢的速度卡住。