ColossalAI [BUG]: Some unsupported torch function is operated upon this parameter

[BUG]: Some unsupported torch function is operated upon this parameter

Open EricZgw opened this issue 1 year ago • 2 comments

🐛 Describe the bug

When I train stable diffusion and set cond_stage_trainable to True, I encounter this error: Some unsupported torch function is operated upon this parameter. Is it only supported to update unet weights？

Environment

CUDA11.2 Python3.7.10 Pytorch1.13.1

Feb 27 '23 13:02 EricZgw

Please update lightning and colossalai to the latest version.

Mar 07 '23 07:03 MichelleMa8

not work

Mar 15 '23 13:03 clovermini

The same error occurs when I try to change the code to forward unet model twice. Have you figured out how to fix it?

Mar 26 '23 12:03 aaab8b

We have updated a lot. Please check the latest code. This issue was closed due to inactivity. Thanks.

Apr 26 '23 10:04 binmakeswell

I also encountered the same error. Did you manage to resolve it later on?

Jul 17 '23 01:07 gaylong9

I also encountered the same error. Did you manage to resolve it later on?

I'm using version 0.3.0 of colossalai. I encountered a RuntimeError: Parameter "tor_bond_conv.batch_norm.bias" failed at the gradient reduction. Some unsupported torch function is operated upon this parameter. error. In the gemini_plugin.py file, I found a comment mentioning that the support for zero in colossalai is currently not optimal, along with the commented line model = nn.SyncBatchNorm.convert_sync_batchnorm(model, None). I suspected that the issue was caused by the Batch Normalization layers in the model. However, even after uncommenting that line, the error still persists and remains unchanged.

Jul 17 '23 03:07 gaylong9

ColossalAI ColossalAI copied to clipboard

[BUG]: Some unsupported torch function is operated upon this parameter

🐛 Describe the bug

Environment

ColossalAI
ColossalAI copied to clipboard