CogVideo icon indicating copy to clipboard operation
CogVideo copied to clipboard

does it support multiple gpu

Open BASSEM45325 opened this issue 11 months ago • 8 comments

System Info / 系統信息

i used i2v and add .to('cuda') with removing offline but it still not using all gpus iam using 4 a10 with 24 vram

Information / 问题信息

  • [ ] The official example scripts / 官方的示例脚本
  • [x] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b-I2V", subfolder="text_encoder", torch_dtype=torch.bfloat16)

quantize_(text_encoder, quantization())

transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b-I2V",subfolder="transformer", torch_dtype=torch.bfloat16)

quantize_(transformer, quantization())

vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-5b-I2V", subfolder="vae", torch_dtype=torch.bfloat16)

quantize_(vae, quantization())

Create pipeline and run inference

pipe = CogVideoXImageToVideoPipeline.from_pretrained( "THUDM/CogVideoX-5b-I2V", text_encoder=text_encoder, transformer=transformer, vae=vae, torch_dtype=torch.bfloat16, ).to('cuda')

Manually assign components to GPUs

pipe.vae.enable_tiling() pipe.vae.enable_slicing()

print(pipe.text_encoder.device) print(pipe.transformer.device) print(pipe.vae.device)

video = pipe( prompt='test', image=image, num_videos_per_prompt=1, num_inference_steps=50, num_frames=49, guidance_scale=6, ).frames[0] out = 'temp.mp4' export_to_video(video, f'{out}', fps=8)

Expected behavior / 期待表现

got only one gpu uitilized

BASSEM45325 avatar Jan 28 '25 09:01 BASSEM45325

Please check inference/cli_demo.py to see how to distribute to multiple GPUs, but this does not support quantization.

zRzRzRzRzRzRzR avatar Jan 28 '25 12:01 zRzRzRzRzRzRzR

i have tried and got error

pipe = CogVideoXImageToVideoPipeline.from_pretrained( "THUDM/CogVideoX-5b-I2V", torch_dtype=torch.bfloat16, device_map="balanced" )

pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing") pipe.vae.enable_tiling() pipe.vae.enable_slicing()

video = pipe( prompt=prompt, image=image, num_videos_per_prompt=1, num_inference_steps=50, num_frames=49, guidance_scale=6, use_dynamic_cfg=True, generator=torch.Generator().manual_seed(112), ).frames[0]

export_to_video(video, "output.mp4", fps=8)

Loading checkpoint shards: 100%|█| 2/2 [00:01<00:00, 1.24it Loading pipeline components...: 100%|█| 5/5 [00:07<00:00, 1 Traceback (most recent call last): File "/home/ec2-user/test/t.py", line 21, in video = pipe( File "/home/ec2-user/test/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/ec2-user/test/.venv/lib/python3.10/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py", line 782, in call latents, image_latents = self.prepare_latents( File "/home/ec2-user/test/.venv/lib/python3.10/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py", line 407, in prepare_latents image_latents = torch.cat([image_latents, latent_padding], dim=1) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)

BASSEM45325 avatar Jan 28 '25 12:01 BASSEM45325

it solved when i used this i dont know why it not working on 4 gpus it only working on 2 gpus

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1

BASSEM45325 avatar Jan 28 '25 18:01 BASSEM45325

and when i put .to('cuda') igot error also

BASSEM45325 avatar Jan 29 '25 12:01 BASSEM45325

CUDA_VISIBLE_DEVICES=0,1,2,3 python cli_demo.py ?

zhipuch avatar Feb 20 '25 07:02 zhipuch

it solved when i used this i dont know why it not working on 4 gpus it only working on 2 gpus

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1

Same error. @zRzRzRzRzRzRzR @BASSEM45325 Have you solved it? Thanks

LIUQI-creat avatar Mar 13 '25 03:03 LIUQI-creat

and when i put .to('cuda') igot error also

same error I got ValueError: It seems like you have activated sequential model offloading by calling enable_sequential_cpu_offload, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline .to('cpu') or consider removing the move altogether if you use sequential offloading.

breakices avatar Mar 15 '25 04:03 breakices

enable_sequential_cpu_offload should remove,

zRzRzRzRzRzRzR avatar Mar 24 '25 03:03 zRzRzRzRzRzRzR