Open-Sora
Open-Sora copied to clipboard
can I Inference with 2 nvidia-4090?
Hi,
Thanks for opensora! I am trying Inference on my workstation (with 2 nvidia-4090) as the following:
(my cli-command): torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./checkpoint/OpenSora-v1-HQ-16x256x256.pth
(my testing):
- if I keep the current config(in 16x256x256.py: samples = 16) then I got "GPU no-enough-memory" error-message.
- if I change config to sample = 4 then I could generate the output sample.mp4 successfully: but the image-quality is not good (I am using the same prompt "A serene night scene in a forested area..." and I can see some buildings and cars in my output-sample, but the video-quality is much lower than your sample).
my questions are:
- can I run Inference on nvidia-4090? if 1 nvidia-4090(24GB) memory is not enough can I run Inference on 2 nvidia-4090?
- how can I get the high output-sample quality (same as your samples)? can I get the high-quality output-video by your Model Weights (OpenSora-v1-HQ-16x256x256.pth)? or I need to continue to train it?
Thanks!
(my output sample with OpenSora-v1-HQ-16x256x256.pth and samples = 4):
https://github.com/hpcaitech/Open-Sora/assets/163908077/43b368b9-328f-4518-9b78-585ebdca16d5
hey @minounou this is unrelated to you question, but would you mind sharing your torch, torchvision, and nvidia-drivers versions? I also have a 4090 but am having trouble getting my install to work.
thanks
@jstmn
OK sure here is my config (my cuda is 12.2, I have 2 nvidia-4090, but it seems I only can use 1 of them now):
pytorch... conda create -n opensora20240317 python=3.10 conda activate opensora20240317 conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
install flash attention pip install packaging ninja pip install flash-attn --no-build-isolation
install apex (apex: for this I need to comment setup.py line-39, so 12.2-cuda and my pytorch coould work together) git clone https://github.com/NVIDIA/apex.git cd apex (update setup.py line-39) pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
install xformers pip install -U xformers --index-url https://download.pytorch.org/whl/cu121
install this project git clone https://github.com/hpcaitech/Open-Sora cd Open-Sora pip install -v .
Model Weights: download T5 and OpenSora Model Weights (in github-page)
Inference: (update sample from 16 to 4: otherwise got GPU not-enough-mem error): torchrun --standalone --nproc_per_node 1 scripts/inference.py configs/opensora/inference/16x256x256.py --ckpt-path ./checkpoint/OpenSora-v1-HQ-16x256x256.pth
Same question for one NVIDIA RTX 4090, as this might be the configuration for most casual user and some universities.
我用一块3090,可以跑16x256x256和16x512x512的,batchsize设成1
我用一块3090,可以跑16x256x256和16x512x512的,batchsize设成1
只是infer么,finetune的话是不是不太够了
我用一块3090,可以跑16x256x256和16x512x512的,batchsize设成1
只是infer么,finetune的话是不是不太够了
是的呢
@tanghaom batchsize设成1后、现在可以生成16x256x256、谢谢!
https://github.com/hpcaitech/Open-Sora/assets/163908077/d64c1465-362d-4c9e-9c55-1bfb22dd39a4
https://github.com/hpcaitech/Open-Sora/assets/13972782/9c48b0f0-1f25-4f89-afdb-5a28aefca982
Prompt: A soaring drone footage captures the majestic beauty of a coastal cliff, its red and yellow stratified rock faces rich in color and against the vibrant turquoise of the sea. Seabirds can be seen taking flight around the cliff's precipices. As the drone slowly moves from different angles, the changing sunlight casts shifting shadows that highlight the rugged textures of the cliff and the surrounding calm sea. The water gently laps at the rock base and the greenery that clings to the top of the cliff, and the scene gives a sense of peaceful isolation at the fringes of the ocean. The video captures the essence of pristine natural beauty untouched by human structures.
https://github.com/hpcaitech/Open-Sora/assets/13972782/a8ac6e18-301c-4e97-ac07-e1e3e53a7cae
Prompt: A basketball bouncing on a grey sidewalk
It seems like short prompts are a lot less accurate / realistic
this is with num_frames=8, 16x256x256, batch_size=1 - the maximum I can do on my 4090