🐛[BUG]: Running `corrdiff/generate.py` raises a shape exception
Version
latest
On which installation method(s) does this occur?
No response
Describe the issue
See log output below
Minimum reproducible example
No response
Relevant log output
Error executing job with overrides: ['dataset.data_path=/data/gideond/corrdiff_inference_package/dataset/2023-01-24-cwb-4years_5times.zarr', 'res_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/diffusion.mdlus', 'reg_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/regression.mdlus', 'seed_batch_size=5', 'use_torch_compile=false']
Traceback (most recent call last):
File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 310, in main
generate_and_save(
File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 396, in generate_and_save
image_out = generate_fn(image_lr)
^^^^^^^^^^^^^^^^^^^^^
File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 232, in generate_fn
image_reg = generate(
^^^^^^^^^
File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 541, in generate
images = sampler_fn(
^^^^^^^^^^^
File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 609, in unet_regression
x_next = net(x_hat[0:1], x_lr, t_hat, class_labels).to(torch.float64)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/unet.py", line 152, in forward
F_x = self.model(
^^^^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/nvtx/nvtx.py", line 116, in inner
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/song_unet.py", line 347, in forward
x = block(x, emb) if isinstance(block, UNetBlock) else block(x)
^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/layers.py", line 224, in forward
x = torch.nn.functional.conv2d(x, w, padding=w_pad)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [128, 20, 3, 3], expected input[1, 16, 448, 448] to have 20 channels, but got 16 channels instead
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Environment details
No response
i also have same error raised using CorrDiff Inference Package
This is due to the wrong arguments in config_generate.yaml, the input channels: [0, 1, 2, 3, 4, 9, 10, 11, 12, 17, 18, 19] didn't match the 20 channels used in pre-trained models. what's more, they are also overlapped with output channels, [0, 17, 18, 19].
Version
latest
On which installation method(s) does this occur?
No response
Describe the issue
See log output below
Minimum reproducible example
No response
Relevant log output
Error executing job with overrides: ['dataset.data_path=/data/gideond/corrdiff_inference_package/dataset/2023-01-24-cwb-4years_5times.zarr', 'res_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/diffusion.mdlus', 'reg_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/regression.mdlus', 'seed_batch_size=5', 'use_torch_compile=false'] Traceback (most recent call last): File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 310, in main generate_and_save( File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 396, in generate_and_save image_out = generate_fn(image_lr) ^^^^^^^^^^^^^^^^^^^^^ File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 232, in generate_fn image_reg = generate( ^^^^^^^^^ File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 541, in generate images = sampler_fn( ^^^^^^^^^^^ File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 609, in unet_regression x_next = net(x_hat[0:1], x_lr, t_hat, class_labels).to(torch.float64) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/unet.py", line 152, in forward F_x = self.model( ^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/nvtx/nvtx.py", line 116, in inner result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/song_unet.py", line 347, in forward x = block(x, emb) if isinstance(block, UNetBlock) else block(x) ^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/layers.py", line 224, in forward x = torch.nn.functional.conv2d(x, w, padding=w_pad) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Given groups=1, weight of size [128, 20, 3, 3], expected input[1, 16, 448, 448] to have 20 channels, but got 16 channels instead
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Environment details
No response
这个bug可能是来源于train.py line 127左右的N_grid_channels参数,他改变了整个模型的输入形状,当我去掉该参数,并注释掉train.py line 144对于该参数的引用之后该bug就消失了,并且output channels也应该改为cfg.dataset.out_channels = [0,1,2,3]。 This bug may come from the parametern _ grid _ channels around train.py line 127, which changed the input shape of the whole model. When I removed this parameter and commented out the reference of train.py line 144, the bug disappeared, and output channels need to changged cfg.dataset.out_channels = [0,1,2,3].
@gideonite @windsoryin The checkpoints distributed through the inference package are not expected to be compatible with the latest physicsnemo training and generation scripts. The right way to use the checkpoints from the inference package is to directly use them in earth2studio. For example in this example. We will update the readme to clarify the point.