SkyReels-V2 icon indicating copy to clipboard operation
SkyReels-V2 copied to clipboard

Batch Mode + Maintain Aspect Ratio + Multi-GPU Random Seed + Fixed Multi-GPU CuSolver Error + Fixed 20-Min Load Time + Video Input & Multiple Prompts

Open pftq opened this issue 8 months ago • 5 comments

I made a number of small quality-of-life changes in this PR here: https://github.com/SkyworkAI/SkyReels-V2/pull/31

In case it doesn't get accepted, the fork can also be pulled here: https://github.com/pftq/SkyReels-V2_Improvements/

Changelist:

  • Added seed synchronization code to allow random seed with multi-GPU (https://github.com/SkyworkAI/SkyReels-V2/issues/24).
  • Reduced 20-min+ load time on multi-GPU to ~8min by fixing contention (all GPUs loading models at once). Indirectly also solved CPU RAM spike during multi-GPU (>200GB on 4 GPUs) (https://github.com/SkyworkAI/SkyReels-V2/issues/28).
  • Fixed CuSolver error that occasionally comes up in multi-GPU by presetting linear algebra library (https://github.com/SkyworkAI/SkyReels-V2/issues/37).
  • Added batch_size parameter to allow multiple videos to generate without reloading the model, which takes about 20 min on multi-gpu so this saves a lot of time.
  • Added preserve_image_aspect_ratio parameter to allow preserving original image aspect ratio.
  • Fixed DF script not resize-cropping the image (I2V script does it but DF is missing the code).
  • Exposed negative_prompt to allow that to be changed/overwritten.
  • Friendlier filenames with date, seed, cfg, steps, and other details in front.

I also integrated chaojie's fork, which has extremely useful new functionalities:

  • Prompt travel / multiple prompts, allow multiple text strings in the --prompt parameter to guide the video differently each chunk of base_num_frames.
  • Video input via --video parameter, allow continuing/extending from a video.
  • Partially complete videos will be output as each chunk of base_num_frames completes. In combination with the --video paramater, this lets you effectively resume from a previous render as well as abort mid-render if the videos take a turn you don't like. Extremely useful for saving time and "watching" as the renders complete rather than committing the full time.

Multi-GPU with video input and prompt travel, batch of 10, preserving aspect ratio. Change --video "video.mp4" to --image "image.jpg" if you want to load a starting image instead.

model_id=Skywork/SkyReels-V2-DF-14B-540P
gpu_count=2
torchrun --nproc_per_node=${gpu_count} generate_video_df.py \
  --model_id ${model_id} \
  --resolution 540P \
  --ar_step 0 \
  --base_num_frames 97 \
  --num_frames 289 \
  --overlap_history 17 \
  --inference_steps 50 \
  --guidance_scale 6 \
  --batch_size 10 \
  --preserve_image_aspect_ratio \
  --video "video.mp4" \
  --prompt "The first thing he does" \
  "The second thing he does." \
  "The third thing he does." \
  --negative_prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
  --addnoise_condition 20 \
  --use_ret_steps \
  --teacache_thresh 0.0 \
  --use_usp \
  --offload

Single GPU with video input and prompt travel, batch of 10, preserving aspect ratio. Change --video "video.mp4" to --image "image.jpg" if you want to load a starting image instead.

model_id=Skywork/SkyReels-V2-DF-14B-540P
python3 generate_video_df.py \
  --model_id ${model_id} \
  --resolution 540P \
  --ar_step 0 \
  --base_num_frames 97 \
  --num_frames 289 \
  --overlap_history 17 \
  --inference_steps 50 \
  --guidance_scale 6 \
  --batch_size 10 \
  --preserve_image_aspect_ratio \
  --video "video.mp4" \
  --prompt "The first thing he does" \
  "The second thing he does." \
  "The third thing he does." \
  --negative_prompt "Bright tones, overexposed, static, blurred details, subtitles, style, works, paintings, images, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, extra fingers, poorly drawn hands, poorly drawn faces, deformed, disfigured, misshapen limbs, fused fingers, still picture, messy background, three legs, many people in the background, walking backwards" \
  --addnoise_condition 20 \
  --use_ret_steps \
  --teacache_thresh 0.0 \
  --offload

pftq avatar Apr 23 '25 05:04 pftq

感谢分享,官方df的不支持竖屏,多卡跑也加载不动,单卡跑的df 宽屏 跑了四个小时257帧

QingQingS avatar Apr 23 '25 08:04 QingQingS

尝试跑了下,会报错:

Image

QingQingS avatar Apr 23 '25 09:04 QingQingS

Thanks for letting me know. It looks like it's an edge case that requires dimensions to be divisible by 16. I updated the fork to add slight padding to handle that. Please see if that works (you can just replace the code in the generate_video_df.py).

These are two videos I generated just now so it should work both horizontal and vertical. (ignore the low quality, I used 10 steps for lower render time). For reference, I am using images 832x480 and 480x832. I also tested with 480x480, so it works with square images too.

https://github.com/user-attachments/assets/229a322b-6001-4768-9be5-5679d41ae4ab

https://github.com/user-attachments/assets/2d81bf8f-6ad3-4e43-9a62-248223ed86c1

If you still have issues, please let me know the image dimensions and the prompt you are using. For reference, I am using the below prompt for the example videos:

model_id=Skywork/SkyReels-V2-DF-14B-540P
gpu_count=4
torchrun --nproc_per_node=${gpu_count} generate_video_df.py \
  --model_id ${model_id} \
  --resolution 540P \
  --ar_step 0 \
  --base_num_frames 97 \
  --num_frames 97 \
  --overlap_history 17 \
  --inference_steps 10\
  --guidance_scale 6 \
  --batch_size 10 \
  --preserve_image_aspect_ratio \
  --image "2025-02-27_10-30-52_fixed_vertical.jpg" \
  --prompt "Woman on a ship with long black hair and dress. It is windy and rainy. The lighting has a cinematic green color grading." \
  --addnoise_condition 20 \
  --use_usp \
  --offload

pftq avatar Apr 23 '25 11:04 pftq

The PR now also includes a fix to the 20-min load time on multi-GPU setups. https://github.com/SkyworkAI/SkyReels-V2/issues/28#issuecomment-2826199716

pftq avatar Apr 24 '25 03:04 pftq

I love the features in this version but absolutely need the --end_image argument in the diffusion forcing script. Is it possible that can be included?

Haligator1 avatar Jun 09 '25 21:06 Haligator1