Open-Sora-Plan issues

Why is VAE designed to be causal, what's the advantage of it?

1

Usually we design causal models because we want to use autoregressive generation afterward, but as diffusion is generating in parallel, why is VAE designed to be causal? What's the intuition...

awei-6

我在运行推理时报如下错误，该如何解决呢？ RuntimeError: Failed to import diffusers.models.autoencoders.autoencoder_kl because of the following error (look up to see its traceback): /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

521smilestar

Great work! A question about sampler_method.

6

When I choose DDIM as sampler method, the results are bad. The results from PNDM sampler method are great. I want to know the scheduler configs when using DDIM sampler...

jiaxiangc

Provide wheels for cuda compiled extensions like Rope2D

1

Looks like there are some native extensions, but the wheels are not provided which makes it very challenging to install them.

isidentical

Any plan to release the Pexels videos or links?

4

Hi~ Thank you for your great work & the recent update. After reading the report, I noticed that the newly added videos are mainly obtained from Pexels. However, I could...

suilin0432

v.1.1 inferece invalid video?

first I need to modify T5 dtype " text_encoder = T5EncoderModel.from_pretrained(args.text_encoder_name, cache_dir=args.cache_dir).to(device,torch.float16)" then the generated video is invalid,what is wrong...... ![image](https://github.com/PKU-YuanGroup/Open-Sora-Plan/assets/7641281/b302bbbb-ab79-407e-852c-32340ba4878f)

lbwang2006

Potential reference for VAE-GAN

1

Hello there! I just read 1.1 report, and is absolutely amazing. I realize you are using GAN loss for training the VAE? I have done a similar work [here](https://openaccess.thecvf.com/content/CVPR2021/papers/Parmar_Dual_Contradistinctive_Generative_Autoencoder_CVPR_2021_paper.pdf). Maybe...

DachengLi1

After finetune the model, inference still get noise.

25

I fine tuned the 93x480p with my own collected video dataset and add the pose guidance for control sign. Here's a graph of the loss for my training, the training...

wtjiang98

Attention mask

关键参数设置： train_batch_size: 1 num_frames: 29 max_height: 360 max_width: 640 train_fps: 12 ae: CausalVAEModel_D4_4x8x8 model: OpenSoraT2V-ROPE-L/122 text_encoder_name: google/mt5-xxl 只是换了我自己的数据集，没有使用LengthGroupedSampler。没有修改其他代码。运行：bash scripts/text_condition/gpu/train_t2v.sh 会在 [dataset_utils.py](https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/main/opensora/utils/dataset_utils.py)中的下列代码报错： ```python if self.batch_size == 1 or self.group_frame or...

ZXMMD

Open-Sora-Plan
Open-Sora-Plan copied to clipboard

Metadata

Why is VAE designed to be causal, what's the advantage of it?

eric-version-testing

推理时，无法正常运行，无法导入diffusers

Great work! A question about sampler_method.

Provide wheels for cuda compiled extensions like Rope2D

Any plan to release the Pexels videos or links?

v.1.1 inferece invalid video?

Potential reference for VAE-GAN

After finetune the model, inference still get noise.

Attention mask

← Metadata

Owner

Metadata

Open-Sora-Plan Open-Sora-Plan copied to clipboard

Metadata

← Metadata

Owner

Metadata

Open-Sora-Plan
Open-Sora-Plan copied to clipboard