Pengxiang Li
Pengxiang Li
Yes, you can find the corresponding weights on Hugging Face
This is precisely the problem I am facing at the moment. If we want to do text2video, the existence of image_latents is quite peculiar. I've tried changing the `conv in`...
It looks like it's working well, may I ask how many steps this was trained for?
hi, @LTH14, since I'm new to this field, I have a beginner's question. Can I understand unconditional generation to be the pipeline in the diagram below without the Rep. Dist.?...
Thank you very much for your response, I have another question concerning whether the current unconditional image generation models are unable to perform an implicit denoising of a Rep. Dist....
hi, [ersanliqiao](https://github.com/ersanliqiao) Can you provide some more detailed information?
Thank you very much for your appreciation. We will continue to iterate the version in the future, hoping for a more accurate understanding of timing in the video. Of course,...
I'm sorry, at the beginning of writing this code, I was more focused on supporting SVD training and didn't consider the memory issues much. This has caused some inconvenience to...
Thanks for pointing this out, @xuehy, and thanks @potatoQi for echoing the concern. Yes, if different processes (especially on different GPUs) are getting the exact same data at each iteration...