System Info / 系統信息

i used i2v and add .to('cuda') with removing offline but it still not using all gpus iam using 4 a10 with 24 vram

Information / 问题信息

[ ] The official example scripts / 官方的示例脚本
[x] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

text_encoder = T5EncoderModel.from_pretrained("THUDM/CogVideoX-5b-I2V", subfolder="text_encoder", torch_dtype=torch.bfloat16)

quantize_(text_encoder, quantization())

transformer = CogVideoXTransformer3DModel.from_pretrained("THUDM/CogVideoX-5b-I2V",subfolder="transformer", torch_dtype=torch.bfloat16)

quantize_(transformer, quantization())

vae = AutoencoderKLCogVideoX.from_pretrained("THUDM/CogVideoX-5b-I2V", subfolder="vae", torch_dtype=torch.bfloat16)

quantize_(vae, quantization())

Create pipeline and run inference

pipe = CogVideoXImageToVideoPipeline.from_pretrained( "THUDM/CogVideoX-5b-I2V", text_encoder=text_encoder, transformer=transformer, vae=vae, torch_dtype=torch.bfloat16, ).to('cuda')

Manually assign components to GPUs

pipe.vae.enable_tiling() pipe.vae.enable_slicing()

print(pipe.text_encoder.device) print(pipe.transformer.device) print(pipe.vae.device)

video = pipe( prompt='test', image=image, num_videos_per_prompt=1, num_inference_steps=50, num_frames=49, guidance_scale=6, ).frames[0] out = 'temp.mp4' export_to_video(video, f'{out}', fps=8)

Expected behavior / 期待表现

got only one gpu uitilized

Jan 28 '25 09:01 BASSEM45325

Please check inference/cli_demo.py to see how to distribute to multiple GPUs, but this does not support quantization.

Jan 28 '25 12:01 zRzRzRzRzRzRzR

i have tried and got error

pipe = CogVideoXImageToVideoPipeline.from_pretrained( "THUDM/CogVideoX-5b-I2V", torch_dtype=torch.bfloat16, device_map="balanced" )

pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing") pipe.vae.enable_tiling() pipe.vae.enable_slicing()

video = pipe( prompt=prompt, image=image, num_videos_per_prompt=1, num_inference_steps=50, num_frames=49, guidance_scale=6, use_dynamic_cfg=True, generator=torch.Generator().manual_seed(112), ).frames[0]

export_to_video(video, "output.mp4", fps=8)

Loading checkpoint shards: 100%|█| 2/2 [00:01<00:00, 1.24it Loading pipeline components...: 100%|█| 5/5 [00:07<00:00, 1 Traceback (most recent call last): File "/home/ec2-user/test/t.py", line 21, in video = pipe( File "/home/ec2-user/test/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/ec2-user/test/.venv/lib/python3.10/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py", line 782, in call latents, image_latents = self.prepare_latents( File "/home/ec2-user/test/.venv/lib/python3.10/site-packages/diffusers/pipelines/cogvideo/pipeline_cogvideox_image2video.py", line 407, in prepare_latents image_latents = torch.cat([image_latents, latent_padding], dim=1) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)

Jan 28 '25 12:01 BASSEM45325

it solved when i used this i dont know why it not working on 4 gpus it only working on 2 gpus

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1

Jan 28 '25 18:01 BASSEM45325

and when i put .to('cuda') igot error also

Jan 29 '25 12:01 BASSEM45325

CUDA_VISIBLE_DEVICES=0,1,2,3 python cli_demo.py ?

Feb 20 '25 07:02 zhipuch

it solved when i used this i dont know why it not working on 4 gpus it only working on 2 gpus

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1

Same error. @zRzRzRzRzRzRzR @BASSEM45325 Have you solved it? Thanks

Mar 13 '25 03:03 LIUQI-creat

and when i put .to('cuda') igot error also

same error I got ValueError: It seems like you have activated sequential model offloading by calling enable_sequential_cpu_offload, but are now attempting to move the pipeline to GPU. This is not compatible with offloading. Please, move your pipeline .to('cpu') or consider removing the move altogether if you use sequential offloading.

Mar 15 '25 04:03 breakices

enable_sequential_cpu_offload should remove,

Mar 24 '25 03:03 zRzRzRzRzRzRzR

does it support multiple gpu

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

quantize_(text_encoder, quantization())

quantize_(transformer, quantization())

quantize_(vae, quantization())

Create pipeline and run inference

Manually assign components to GPUs

Expected behavior / 期待表现