Open-Sora-Plan issues

A million-scale text-to-video prompt-gallery dataset

5

Hi, We contribute the first dataset featuring 1.67 million unique text-to-video prompts and 6.69 million videos generated from 4 different state-of-the-art diffusion models. We hope it can help your Open-Sora...

WangWenhao0716

[feat] Dataset embeddings/latents caching for more flexible experiments

1

Running VAEs and CLIP/T5 embedders is time expensive, and this cost scales up fast when multiple trainings are re-run. As we keep these parts frozen and train only the diffusion...

kabachuha

[🐞Bug] The absence of quantization loss make VQVAE to VAE.

1

When I try to train VQVAE on my own data, I find the loss for vqvae training is only reconstruction loss https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/fdc786bc8e52d6386fb32c833eba0b4db286ca7b/opensora/models/ae/videobase/vqvae/trainer_vqvae.py#L11-L19, without the codebook loss like VideoGPT: https://github.com/PKU-YuanGroup/Open-Sora-Plan/blob/fdc786bc8e52d6386fb32c833eba0b4db286ca7b/opensora/models/ae/videobase/vqvae/videogpt/videogpt/vqvae.py#L64-L69 Are...

cool-xuan

[refactor] update docker scripts with ci image support

# Updates - Group files with paths; - Add CI docker image support; - Update requirements.txt to latest; - GitHub workflow support; Note: You need set DOCKER_USERNAME and DOCKER_ACCESS_TOKEN in...

SimonLeeGit

Proposal - implement MaskDiT technique for fast training

3

This repo: [Fast Training of Diffusion Models with Masked Transformers](https://github.com/Anima-Lab/MaskDiT) suggests using masked transformers architecture for faster DiT training. They claim that > Experiments on ImageNet-256x256 and ImageNet-512x512 show that...

RuslanPeresy

225x90x90

1

Hello, I noticed you mentioned that the latest code could support training with a latent size of 225x90x90, which seems quite large. However, I couldn't find the corresponding training script...

yhy-2000

taozhiyuai

Open-Sora-Plan
Open-Sora-Plan copied to clipboard

Metadata

A million-scale text-to-video prompt-gallery dataset

[feat] Dataset embeddings/latents caching for more flexible experiments

[🐞Bug] The absence of quantization loss make VQVAE to VAE.

[refactor] update docker scripts with ci image support

Proposal - implement MaskDiT technique for fast training

225x90x90

[refactor]: add a scrips/train_vqvae.sh file

Add CLIP support and example

Refactor the dataset setup of videoae

可以把需要下载的文件放在夸克网盘上分享吗?

← Metadata

Owner

Metadata

Open-Sora-Plan Open-Sora-Plan copied to clipboard

Metadata

← Metadata

Owner

Metadata

Open-Sora-Plan
Open-Sora-Plan copied to clipboard