diffusers
diffusers copied to clipboard
bug in load lora weights when add align_device_hook to model
Describe the bug
i noticed that when i add align_device_hook to module in pipeline manually, then load_lora_weights function will enable the sequential cpu offload. so i dig deeper and find that load_lora_weights function use _optionally_disable_offloading function to decide whether to sequentially cpu offload. this use _optionally_disable_offloading function was:
def _optionally_disable_offloading(cls, _pipeline):
"""
Optionally removes offloading in case the pipeline has been already sequentially offloaded to CPU.
Args:
_pipeline (`DiffusionPipeline`):
The pipeline to disable offloading for.
Returns:
tuple:
A tuple indicating if `is_model_cpu_offload` or `is_sequential_cpu_offload` is True.
"""
is_model_cpu_offload = False
is_sequential_cpu_offload = False
if _pipeline is not None:
for _, component in _pipeline.components.items():
if isinstance(component, nn.Module) and hasattr(component, "_hf_hook"):
if not is_model_cpu_offload:
is_model_cpu_offload = isinstance(component._hf_hook, CpuOffload)
if not is_sequential_cpu_offload:
is_sequential_cpu_offload = isinstance(component._hf_hook, AlignDevicesHook)
logger.info(
"Accelerate hooks detected. Since you have called `load_lora_weights()`, the previous hooks will be first removed. Then the LoRA parameters will be loaded and the hooks will be applied again."
)
remove_hook_from_module(component, recurse=is_sequential_cpu_offload)
return (is_model_cpu_offload, is_sequential_cpu_offload)
so i was curious that why is_sequential_cpu_offload = True when component has AlignDevicesHook? Shouldn't it be True only when the component device is CPU?
Reproduction
from diffusers import StableDiffusionControlNetImg2ImgPipeline,ControlNetModel
from accelerate.hooks import attach_align_device_hook_on_blocks
pipe = StableDiffusionControlNetImg2ImgPipeline.from_pretrained("/media/74nvme/checkpoints/diffusers_models/stable-diffusion-v1-5/",controlnet=[controlnet1,controlnet2],torch_dtype=torch.float16).to('cuda:0')
module_names, _ = pipe._get_signature_keys(pipe)
modules = [getattr(pipe, n, None) for n in module_names]
module_names = [name for m,name in zip(modules,module_names) if isinstance(m, torch.nn.Module)]
modules = [m for m in modules if isinstance(m, torch.nn.Module)]
print(module_names)
for module,name in zip(modules,module_names):
if name == 'unet' or name == 'controlnet':
module.to('cuda:0')
attach_align_device_hook_on_blocks(
module,
execution_device=module.device,
)
else:
module.to('cuda:1')
attach_align_device_hook_on_blocks(
module,
execution_device=module.device,
)
and then
pipe.load_lora_weights(lora_weights_path) will change all component device
Logs
No response
System Info
diffusers:0.25.1 torch:2.2.0+cu118
Who can help?
No response
Hi:
so i was curious that why is_sequential_cpu_offload = True when component has AlignDevicesHook? Shouldn't it be True only when the component device is CPU?
by "sequentiaal_cpu_offload" we are referring to the enable_sequential_cpu_offload method that you can call on our pipelines
https://github.com/huggingface/diffusers/blob/19ab04ff56cb4389f6a988b985b76da48a75ada9/src/diffusers/pipelines/pipeline_utils.py#L1037
I don't think this is a bug, no?
i think the pr #6857 will add AlignDevicesHook to every model in the pipeline. so if i use that new feature and load lora weights simultaneously, enable_sequential_cpu_offload will be called in load_lora_weights method. I wanna know if my understanding is correct. thank you for explanantion
i think the pr #6857 will add AlignDevicesHook to every model in the pipeline. so if i use that new feature and load lora weights simultaneously,
enable_sequential_cpu_offloadwill be called inload_lora_weightsmethod. I wanna know if my understanding is correct. thank you for explanantion
The PR addresses a different problem, not sure how it's related.
The PR addresses a different problem
yes, but that pr need AlignDevicesHook to prepare the model input before call model forward method. and if i call the load_lora_weights method. finally, the enable_sequential_cpu_offload() method will be called because of the AlignDevicesHook. the chain is like this
load_lora_weights -> load_lora_into_unet -> _pipeline.enable_sequential_cpu_offload()
the part of load_lora_into_unet code is below:
# In case the pipeline has been already offloaded to CPU - temporarily remove the hooks
# otherwise loading LoRA weights will lead to an error
is_model_cpu_offload, is_sequential_cpu_offload = cls._optionally_disable_offloading(_pipeline)
inject_adapter_in_model(lora_config, unet, adapter_name=adapter_name)
incompatible_keys = set_peft_model_state_dict(unet, state_dict, adapter_name)
if incompatible_keys is not None:
# check only for unexpected keys
unexpected_keys = getattr(incompatible_keys, "unexpected_keys", None)
if unexpected_keys:
logger.warning(
f"Loading adapter weights from state_dict led to unexpected keys not found in the model: "
f" {unexpected_keys}. "
)
# Offload back.
if is_model_cpu_offload:
_pipeline.enable_model_cpu_offload()
elif is_sequential_cpu_offload:
_pipeline.enable_sequential_cpu_offload()
# Unsafe code />
and the load_lora_into_unet method will call cls._optionally_disable_offloading(_pipeline) to decide whether to call enable_sequential_cpu_offload() method
in _optionally_disable_offloading(pipeline) method, the is_sequential_cpu_offload will be set True because of the AlignDevicesHook in module. part of _optionally_disable_offloading method code is below:
for _, component in _pipeline.components.items():
if isinstance(component, nn.Module) and hasattr(component, "_hf_hook"):
if not is_model_cpu_offload:
is_model_cpu_offload = isinstance(component._hf_hook, CpuOffload)
if not is_sequential_cpu_offload:
is_sequential_cpu_offload = isinstance(component._hf_hook, AlignDevicesHook)
logger.info(
"Accelerate hooks detected. Since you have called `load_lora_weights()`, the previous hooks will be first removed. Then the LoRA parameters will be loaded and the hooks will be applied again."
)
remove_hook_from_module(component, recurse=is_sequential_cpu_offload)
i'm not sure if i'm right. i just want to figure out why call load_lora_weights after add AlignDevicesHook to module will set every component's device to meta. and thank you for your patient explanation
i find that in pipeline-device-map-auto branch, the pr disable the enable_sequential_cpu_offload() when use device_map=balanced.but i'm still be a little confused why we need call enable_sequential_cpu_offload() in load_lora_weights method when add AlignDevicesHook in models in pipeline
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Is this still a problem?
Is this still a problem?
given that the device-map=auto can only place the model to different gpus according to the model size. i add AlignDevicesHook to every model manually. but the load_lora_weights method will remove all AlignDevicesHook and never add hooks back which really confused me. i change the hook class name, so the load_lora_weights method will not remove the hook. and until now, my code runs as expected
but the load_lora_weights method will remove all AlignDevicesHook
That is only temporary. We add them back. See here:
https://github.com/huggingface/diffusers/blob/7bfc1ee1b2cb0961ff111f50a9d096816e4dd921/src/diffusers/loaders/lora.py#L517
if is_model_cpu_offload:
_pipeline.enable_model_cpu_offload()
elif is_sequential_cpu_offload:
_pipeline.enable_sequential_cpu_offload()
it just call the enable_model_cpu_offload() or enable_sequential_cpu_offload(). actually i never call the enable_model_cpu_offload() or enable_sequential_cpu_offload(), my AlignDevicesHook are added to model manually to place models to different gpus. and i think we can't assume that the enable_model_cpu_offload() or enable_sequential_cpu_offload() are called when some model in pipeline has AlignedDevicesHook
it just call the enable_model_cpu_offload() or enable_sequential_cpu_offload().
Those methods are responsible for placing the hooks.
For example: https://github.com/huggingface/diffusers/blob/7bfc1ee1b2cb0961ff111f50a9d096816e4dd921/src/diffusers/pipelines/pipeline_utils.py#L1077
Thing is we are not supposed to call any offloading related utilities manually when any component underlying a pipeline was initialized with "balanced" device_map. This should be sufficiently clear from the errors:
https://github.com/huggingface/diffusers/blob/7bfc1ee1b2cb0961ff111f50a9d096816e4dd921/src/diffusers/pipelines/pipeline_utils.py#L1030
Thing is we are not supposed to call any offloading related utilities manually when any component underlying a pipeline was initialized with "balanced" device_map.
i agree with that. but in the previous version, _optionally_disable_offloading() method will return is_sequential_cpu_offload=True because of the AlignDevicesHook when using device_map, which will offload the model to cpu
but i need use more flexible device_map, so i'm still adding hook manually. thank you for your patience ! but i still think we are not supposed to set is_sequential_cpu_offload=True just when the model has AlignDevicesHook. maybe somthing like this?:
is_sequential_cpu_offload = component.device.type=='cpu' and (
isinstance(component._hf_hook, AlignDevicesHook)
or hasattr(component._hf_hook, "hooks")
and isinstance(component._hf_hook.hooks[0], AlignDevicesHook)
)
Feel free to open a PR and we can take it from there :-)
i've just created pr #8750 for this
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.