diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[Exception]Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0!

Open weiweiwang opened this issue 1 year ago • 0 comments

Describe the bug

Scenery: Developing a face editing server using fastapi+celery+diffusers, each request will generate a celery task.

Concurrence: 2 workers for fastapi, 1 worker for celery

Diffusers usage:

  • 1 StableDiffusionControlNetPipeline + 3 StableDiffusionControlNetInpaintPipeline
  • share two checkpoints' components
  • enable_model_cpu_offload turned on

Exception: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper_CUDA___slow_conv2d_forward)

Reproduction

pesudo-code(real code are async tasks developed with celery):

    inpainting_pipeline = StableDiffusionInpaintPipeline.from_single_file(
        os.path.join(pretrained_model_dir, "realisticVisionV60B1_v51VAE-inpainting.safetensors", ),
        original_config_file=sd_config_file,
        local_files_only=True,
        use_safetensors=True,
        torch_dtype=torch.float16
    )

    text2img_pipeline = StableDiffusionPipeline.from_single_file(
        os.path.join(pretrained_model_dir, "realisticVisionV60B1_v51VAE.safetensors", ),
        original_config_file=sd_config_file,
        local_files_only=True,
        use_safetensors=True,
        torch_dtype=torch.float16)


   pipeline_1 = StableDiffusionControlNetInpaintPipeline(
        **inpainting_pipeline.components,controlnet = controlnet_inpaint)
   pipeline_2 = StableDiffusionControlNetPipeline(
        **text2img_pipeline.components,controlnet = controlnet_openpose)
   pipeline_3 = StableDiffusionControlNetInpaintPipeline(
        **inpainting_pipeline.components,controlnet = controlnet_lineart)
   pipeline_4 = StableDiffusionControlNetInpaintPipeline(
        **text2img_pipeline.components,controlnet = controlnet_lineart) # reuse the same controlnet_lineart

   pipeline_1~pineline_4 all turn on enable_model_cpu_offload, no call to function .to("cuda")

Logs

Traceback (most recent call last):
  File "/root/projects/octaface/tasks.py", line 274, in change_hair_color_task
    output_image = face_editing.change_hair_color(input_image_file=cached_photo_file_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/projects/octaface/edit/face.py", line 1493, in change_hair_color
    output_image = self.color_inpaint_pipe(
                   ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py", line 1437, in __call__
    latents_outputs = self.prepare_latents(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py", line 994, in prepare_latents
    image_latents = self._encode_vae_image(image=image, generator=generator)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/diffusers/pipelines/controlnet/pipeline_controlnet_inpaint.py", line 1072, in _encode_vae_image
    image_latents = retrieve_latents(self.vae.encode(image), generator=generator)
                                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/diffusers/utils/accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/diffusers/models/autoencoders/autoencoder_kl.py", line 260, in encode
    h = self.encoder(x)
        ^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/diffusers/models/autoencoders/vae.py", line 143, in forward
    sample = self.conv_in(sample)
             ^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 460, in forward
    return self._conv_forward(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/octaface/lib/python3.11/site-packages/torch/nn/modules/conv.py", line 456, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument weight in method wrapper_CUDA___slow_conv2d_forward)

System Info

  • diffusers version: 0.27.2
  • Platform: Linux-5.4.0-153-generic-x86_64-with-glibc2.35
  • Python version: 3.11.9
  • PyTorch version (GPU?): 2.2.1+cu121 (True)
  • Huggingface_hub version: 0.22.1
  • Transformers version: 4.39.1
  • Accelerate version: 0.28.0
  • xFormers version: 0.0.25
  • Using GPU in script?: python celery tasks
  • Using distributed or parallel set-up in script?: only 1 worker for celery

Who can help?

No response

weiweiwang avatar May 08 '24 13:05 weiweiwang