Open-Sora
Open-Sora copied to clipboard
[HELP] Why my videos looks so awful?
When I create a video, they look like this. I don't think the prompt is wrong. This is the configuration I use (right now it's at 360p but the same thing happens at 720p; it's not a resolution problem, I'm just creating videos in 360p for speed reasons).
I’ve tried more than 100 prompts: longer, shorter, with more details, with fewer details, simpler, more complex, and basically all of them look very similar to this video. Am I doing something wrong?
Any advice you can give me?? Thank you very much.
{'aes': 7.0, 'align': 5, 'aspect_ratio': '16:9', 'batch_size': 1, 'condition_frame_length': 5, 'config': 'configs/opensora-v1-2/inference/sample.py', 'dtype': 'bf16', 'flow': 5.0, 'fps': 24, 'frame_interval': 1, 'model': {'enable_flash_attn': True, 'enable_layernorm_kernel': True, 'force_huggingface': True, 'from_pretrained': 'hpcai-tech/OpenSora-STDiT-v3', 'qk_norm': True, 'type': 'STDiT3-XL/2'}, 'multi_resolution': 'STDiT2', 'num_frames': '120', 'prompt': ['A cyclist racing down a forested mountain trail. The cyclist ' 'weaves between trees, dodging roots and rocks, with incredible ' 'speed and agility. The trail is narrow and treacherous, with ' 'dense foliage on either side. The scene is a blur of motion, ' 'capturing the adrenaline and challenge of mountain biking.'], 'prompt_as_path': False, 'resolution': '360p', 'save_dir': './samples/samples/', 'save_fps': 24, 'scheduler': {'cfg_scale': 7.0, 'num_sampling_steps': 80, 'type': 'rflow', 'use_timestep_transform': True}, 'seed': 44, 'text_encoder': {'from_pretrained': 'DeepFloyd/t5-v1_1-xxl', 'model_max_length': 300, 'type': 't5'}, 'vae': {'force_huggingface': True, 'from_pretrained': 'hpcai-tech/OpenSora-VAE-v1.2', 'micro_batch_size': 4, 'micro_frame_size': 17, 'type': 'OpenSoraVAE_V1_2'}, 'watermark': False}
https://github.com/hpcaitech/Open-Sora/assets/64336798/cc21192b-e65a-470e-a65c-786f79820dd4
EDIT: Others examples
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora-v1-2/inference/sample.py \ --num-frames 120 --resolution 360p --aspect-ratio 16:9 --watermark False --aes 7 --flow 5 --num-sampling-steps 80 --cfg-scale 7 \ --prompt "A cozy coffe interior on a rainy day. Large windows show the rain falling outside, creating a soothing backdrop. Inside, the café is warm and inviting, with wooden tables, cushioned chairs, and soft lighting. A barista is seen making coffee behind the counter, and patrons are chatting or reading. The atmosphere is relaxed and comforting."
https://github.com/hpcaitech/Open-Sora/assets/64336798/8302946c-2ef9-431b-8622-0fe50396b2d6
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora-v1-2/inference/sample.py \ --num-frames 120 --resolution 360p --aspect-ratio 16:9 --watermark False --aes 7 --flow 5 --num-sampling-steps 80 --cfg-scale 7 --seed 44 \ --prompt "A peaceful garden with a koi pond. The pond is surrounded by stones and lush greenery, with koi fish swimming gracefully in the clear water. A small wooden bridge arches over the pond, and a stone lantern adds to the tranquil setting. The garden is quiet, with the sound of water gently flowing and birds singing. The atmosphere is serene and meditative."
https://github.com/hpcaitech/Open-Sora/assets/64336798/ee0b2662-87cd-43a1-b3e0-2c999021ff24
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora-v1-2/inference/sample.py \ --num-frames 120 --resolution 360p --aspect-ratio 16:9 --watermark False --aes 7 --flow 5 --num-sampling-steps 80 --cfg-scale 7 \ --prompt "A tranquil mountain lake surrounded by pine trees. The water is crystal clear, reflecting the surrounding landscape like a mirror. A small wooden pier extends into the lake, with a lone rowboat tied to it. The mountains in the background are majestic, their peaks dusted with snow. The air appears crisp and the scene is calm and serene."
https://github.com/hpcaitech/Open-Sora/assets/64336798/a20bd0a9-a2a0-42ed-89bc-da89b3a97539
=========================================================
TIP: If you first generate an image (text-to-image) and then create a text-to-video using the image as a reference, the quality of the videos improves significantly.
For example:
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora-v1-2/inference/sample.py \ --num-frames 1 --resolution 1080p --aspect-ratio 16:9 --watermark False --aes 7 --seed 44 --cfg-scale 7 --sample-name image-cond \ --prompt "An underwater city inhabited by bioluminescent sea creatures, glowing in the depths of the ocean."
torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora-v1-2/inference/sample.py \ --num-frames 4s --resolution 360p --aspect-ratio 16:9 --watermark False --aes 7 --flow 5 --num-sampling-steps 90 --seed 44 --cfg-scale 7 \ --prompt 'Create a video capturing the breathtaking beauty of the sunset over the serene lake, with the mountains silhouetted against the colorful sky.{"reference_path": "samples/samples/image-cond_0000.png","mask_strategy": "0"}'
https://github.com/hpcaitech/Open-Sora/assets/64336798/bf278b48-0bb3-4b4c-a2b2-1ebb9537836a
They look fairly reasonable as compared with the demonstrations given by the team. But anyone having suggestions is welcome to share here.
This issue is stale because it has been open for 7 days with no activity.
This issue was closed because it has been inactive for 7 days since being marked as stale.