ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: Some unsupported torch function is operated upon this parameter

Open EricZgw opened this issue 1 year ago • 2 comments

🐛 Describe the bug

When I train stable diffusion and set cond_stage_trainable to True, I encounter this error: Some unsupported torch function is operated upon this parameter. Is it only supported to update unet weights?

Environment

CUDA11.2 Python3.7.10 Pytorch1.13.1

EricZgw avatar Feb 27 '23 13:02 EricZgw

Please update lightning and colossalai to the latest version.

MichelleMa8 avatar Mar 07 '23 07:03 MichelleMa8

not work

clovermini avatar Mar 15 '23 13:03 clovermini

The same error occurs when I try to change the code to forward unet model twice. Have you figured out how to fix it?

aaab8b avatar Mar 26 '23 12:03 aaab8b

We have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.

binmakeswell avatar Apr 26 '23 10:04 binmakeswell

I also encountered the same error. Did you manage to resolve it later on?

gaylong9 avatar Jul 17 '23 01:07 gaylong9

I also encountered the same error. Did you manage to resolve it later on?

I'm using version 0.3.0 of colossalai. I encountered a RuntimeError: Parameter "tor_bond_conv.batch_norm.bias" failed at the gradient reduction. Some unsupported torch function is operated upon this parameter. error. In the gemini_plugin.py file, I found a comment mentioning that the support for zero in colossalai is currently not optimal, along with the commented line model = nn.SyncBatchNorm.convert_sync_batchnorm(model, None). I suspected that the issue was caused by the Batch Normalization layers in the model. However, even after uncommenting that line, the error still persists and remains unchanged.

gaylong9 avatar Jul 17 '23 03:07 gaylong9