taming-transformers
taming-transformers copied to clipboard
Custom training in Colab version, stuck on error --> LightningDistributedDataParallel' object has no attribute '_sync_params'
Hi all, first of all thanks for your work and support. I'm trying to deploy a colab version for custom training, but I got stuck here:
_AttributeError: 'LightningDistributedDataParallel' object has no attribute 'sync_params' (full error below)
Any hint??
Thanks again
Lightning config trainer: distributed_backend: ddp gpus: 0,
| Name | Type | Params
0 | encoder | Encoder | 29.3 M
1 | decoder | Decoder | 42.4 M
2 | loss | VQLPIPSWithDiscriminator | 17.5 M
3 | quantize | VectorQuantizer2 | 262 K
4 | quant_conv | Conv2d | 65.8 K
5 | post_quant_conv | Conv2d | 65.8 K
Validation sanity check: 0it [00:00, ?it/s]/content/taming-transformers/taming/data/utils.py:137: UserWarning: An output with one or more elements was resized since it had shape [983040], which does not match the required output shape [5, 256, 256, 3].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:24.)
return torch.stack(batch, 0, out=out)
/content/taming-transformers/taming/data/utils.py:137: UserWarning: An output with one or more elements was resized since it had shape [983040], which does not match the required output shape [5, 256, 256, 3].This behavior is deprecated, and in a future PyTorch release outputs will not be resized unless they have zero elements. You can explicitly reuse an out tensor t by resizing it, inplace, to zero elements with t.resize_(0). (Triggered internally at ../aten/src/ATen/native/Resize.cpp:24.)
return torch.stack(batch, 0, out=out)
Summoning checkpoint.
Traceback (most recent call last):
File "main.py", line 565, in
comment the line with " trainer_config["distributed_backend"] = "ddp" " on main.py, worked for me
I has the same problem before, which was solved by strictly following the given environment file. My suggestion is that creating a new virtual environment and conda install the environment.yaml.
hi smithee77, have you solved this problem?