DiffSynth-Studio icon indicating copy to clipboard operation
DiffSynth-Studio copied to clipboard

fsdp with wan2.2

Open mountain-lee1 opened this issue 5 months ago • 2 comments

[rank1]: File "/home/lcq/lijiarui/DiffSynth-Studio/examples/wanvideo/model_training/train.py", line 165, in [rank1]: launch_training_task( [rank1]: File "/home/lcq/lijiarui/DiffSynth-Studio/diffsynth/trainers/utils.py", line 491, in launch_training_task [rank1]: loss = model(data=None, inputs=data) [rank1]: File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/home/lcq/lijiarui/DiffSynth-Studio/examples/wanvideo/model_training/train.py", line 111, in forward [rank1]: loss = self.pipe.training_loss(**models, **inputs) [rank1]: File "/home/lcq/lijiarui/DiffSynth-Studio/diffsynth/pipelines/wan_video_new.py", line 86, in training_loss [rank1]: noise_pred = self.model_fn(**inputs, timestep=timestep) [rank1]: File "/home/lcq/lijiarui/DiffSynth-Studio/diffsynth/pipelines/wan_video_new.py", line 1230, in model_fn_wan_video [rank1]: x = torch.utils.checkpoint.checkpoint( [rank1]: File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner [rank1]: return disable_fn(*args, **kwargs) [rank1]: File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 632, in _fn [rank1]: return fn(*args, **kwargs) [rank1]: File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint [rank1]: ret = function(*args, **kwargs) [rank1]: File "/home/lcq/lijiarui/DiffSynth-Studio/diffsynth/pipelines/wan_video_new.py", line 1218, in custom_forward [rank1]: return module(*inputs) [rank1]: File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl [rank1]: return self._call_impl(*args, **kwargs) [rank1]: File "/root/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl [rank1]: return forward_call(*args, **kwargs) [rank1]: File "/home/lcq/lijiarui/DiffSynth-Studio/diffsynth/models/wan_video_dit.py", line 220, in forward [rank1]: self.modulation.to(dtype=t_mod.dtype, device=t_mod.device) + t_mod).chunk(6, dim=chunk_dim) [rank1]: RuntimeError: Output 0 of ViewBackward0 is a view and its base or another view of its base has been modified inplace. This view is the output of a function that returns multiple views. Such functions do not allow the output views to be modified inplace. You should replace the inplace operation by an out-of-place one.

my FSDP config is: compute_environment: LOCAL_MACHINE debug: false fsdp_config: fsdp_activation_checkpointing: false fsdp_auto_wrap_policy: SIZE_BASED_WRAP fsdp_backward_prefetch: BACKWARD_PRE fsdp_cpu_ram_efficient_loading: false fsdp_forward_prefetch: false fsdp_min_num_params: 100000 fsdp_offload_params: false fsdp_sharding_strategy: FULL_SHARD fsdp_state_dict_type: SHARDED_STATE_DICT fsdp_sync_module_states: false fsdp_use_orig_params: false distributed_type: FSDP downcast_bf16: 'no' enable_cpu_affinity: false machine_rank: 0 main_training_function: main mixed_precision: bf16 num_machines: 1 num_processes: 4 rdzv_backend: static same_network: true tpu_env: [] tpu_use_cluster: false tpu_use_sudo: false use_cpu: false

mountain-lee1 avatar Sep 02 '25 01:09 mountain-lee1

Have you solved the problem? and how to load the FSDP config? Thanks~

Vickeyhw avatar Sep 22 '25 11:09 Vickeyhw

vaskers5 avatar Nov 13 '25 23:11 vaskers5