Open-Sora-Plan
Open-Sora-Plan copied to clipboard
This project aim to reproduce Sora (Open AI T2V model), we wish the open source community contribute to this project.
Usually we design causal models because we want to use autoregressive generation afterward, but as diffusion is generating in parallel, why is VAE designed to be causal? What's the intuition...
我在运行推理时报如下错误,该如何解决呢? RuntimeError: Failed to import diffusers.models.autoencoders.autoencoder_kl because of the following error (look up to see its traceback): /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
When I choose DDIM as sampler method, the results are bad. The results from PNDM sampler method are great. I want to know the scheduler configs when using DDIM sampler...
Looks like there are some native extensions, but the wheels are not provided which makes it very challenging to install them.
Hi~ Thank you for your great work & the recent update. After reading the report, I noticed that the newly added videos are mainly obtained from Pexels. However, I could...
first I need to modify T5 dtype " text_encoder = T5EncoderModel.from_pretrained(args.text_encoder_name, cache_dir=args.cache_dir).to(device,torch.float16)" then the generated video is invalid,what is wrong...... data:image/s3,"s3://crabby-images/e890a/e890ad4d21432f3743499bf7e5f22c273bf62646" alt="image"
Hello there! I just read 1.1 report, and is absolutely amazing. I realize you are using GAN loss for training the VAE? I have done a similar work [here](https://openaccess.thecvf.com/content/CVPR2021/papers/Parmar_Dual_Contradistinctive_Generative_Autoencoder_CVPR_2021_paper.pdf). Maybe...
I fine tuned the 93x480p with my own collected video dataset and add the pose guidance for control sign. Here's a graph of the loss for my training, the training...
关键参数设置: train_batch_size: 1 num_frames: 29 max_height: 360 max_width: 640 train_fps: 12 ae: CausalVAEModel_D4_4x8x8 model: OpenSoraT2V-ROPE-L/122 text_encoder_name: google/mt5-xxl 只是换了我自己的数据集,没有使用LengthGroupedSampler。没有修改其他代码。 运行:bash scripts/text_condition/gpu/train_t2v.sh 会在 [dataset_utils.py](https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/opensora/utils/dataset_utils.py)中的下列代码报错: ```python if self.batch_size == 1 or self.group_frame or...