ColossalAI
ColossalAI copied to clipboard
[BUG]: assert len(grad_args) > 0 in training the animate anyone
🐛 Describe the bug
i'm using the colossalai to train the animate anyone. but there is an error in unet forward.
Traceback (most recent call last):
File "core/train_stage2_colo.py", line 294, in <module>
main(conf)
File "core/train_stage2_colo.py", line 236, in main
pred = unet(noisy_latents, timestep, encoder_hidden_states).sample
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/colossalai/zero/gemini/gemini_ddp.py", line 258, in forward
outputs = self.module(*args, **kwargs)
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/74nvme/animate_zl/source/core/model/unet.py", line 378, in forward
emb = self.time_embedding(t_emb)
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/diffusers/models/embeddings.py", line 192, in forward
sample = self.linear_1(sample)
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/colossalai/tensor/colo_parameter.py", line 61, in __torch_function__
new_args = ColoParamOpHookManager.pre_op(params, *args, *kwargs.values())
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/colossalai/tensor/param_op_hook.py", line 88, in pre_op
grad_args, other_args, grad_flags, spec = _flatten_grad_args(args)
File "/media/74nvme/software/miniconda3/envs/animate/lib/python3.8/site-packages/colossalai/tensor/param_op_hook.py", line 145, in _flatten_grad_args
assert len(grad_args) > 0
AssertionError
the part of unet parameters requires_grad_(False), i don't know if it has something to do with this error
Environment
No response
I got the same error: https://github.com/hpcaitech/ColossalAI/issues/5290. Have you solved it? @zhangvia
I got the same error: #5290. Have you solved it? @zhangvia
no。i think the plugin can not support part of layers freezed training.