physicsnemo icon indicating copy to clipboard operation
physicsnemo copied to clipboard

🐛[BUG]: Running `corrdiff/generate.py` raises a shape exception

Open gideonite opened this issue 1 year ago • 2 comments

Version

latest

On which installation method(s) does this occur?

No response

Describe the issue

See log output below

Minimum reproducible example

No response

Relevant log output

Error executing job with overrides: ['dataset.data_path=/data/gideond/corrdiff_inference_package/dataset/2023-01-24-cwb-4years_5times.zarr', 'res_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/diffusion.mdlus', 'reg_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/regression.mdlus', 'seed_batch_size=5', 'use_torch_compile=false']
Traceback (most recent call last):
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 310, in main
    generate_and_save(
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 396, in generate_and_save
    image_out = generate_fn(image_lr)
                ^^^^^^^^^^^^^^^^^^^^^
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 232, in generate_fn
    image_reg = generate(
                ^^^^^^^^^
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 541, in generate
    images = sampler_fn(
             ^^^^^^^^^^^
  File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 609, in unet_regression
    x_next = net(x_hat[0:1], x_lr, t_hat, class_labels).to(torch.float64)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/unet.py", line 152, in forward
    F_x = self.model(
          ^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/nvtx/nvtx.py", line 116, in inner
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/song_unet.py", line 347, in forward
    x = block(x, emb) if isinstance(block, UNetBlock) else block(x)
                                                           ^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/layers.py", line 224, in forward
    x = torch.nn.functional.conv2d(x, w, padding=w_pad)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Given groups=1, weight of size [128, 20, 3, 3], expected input[1, 16, 448, 448] to have 20 channels, but got 16 channels instead

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment details

No response

gideonite avatar May 31 '24 20:05 gideonite

i also have same error raised using CorrDiff Inference Package

windsoryin avatar Jun 11 '24 14:06 windsoryin

This is due to the wrong arguments in config_generate.yaml, the input channels: [0, 1, 2, 3, 4, 9, 10, 11, 12, 17, 18, 19] didn't match the 20 channels used in pre-trained models. what's more, they are also overlapped with output channels, [0, 17, 18, 19].

windsoryin avatar Jun 13 '24 05:06 windsoryin

Version

latest

On which installation method(s) does this occur?

No response

Describe the issue

See log output below

Minimum reproducible example

No response

Relevant log output

Error executing job with overrides: ['dataset.data_path=/data/gideond/corrdiff_inference_package/dataset/2023-01-24-cwb-4years_5times.zarr', 'res_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/diffusion.mdlus', 'reg_ckpt_filename=/data/gideond/corrdiff_inference_package/checkpoints/regression.mdlus', 'seed_batch_size=5', 'use_torch_compile=false'] Traceback (most recent call last): File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 310, in main generate_and_save( File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 396, in generate_and_save image_out = generate_fn(image_lr) ^^^^^^^^^^^^^^^^^^^^^ File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 232, in generate_fn image_reg = generate( ^^^^^^^^^ File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 541, in generate images = sampler_fn( ^^^^^^^^^^^ File "/net/nfs.cirrascale/climate/gideond/home/projects/modulus/examples/generative/corrdiff/generate.py", line 609, in unet_regression x_next = net(x_hat[0:1], x_lr, t_hat, class_labels).to(torch.float64) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/unet.py", line 152, in forward F_x = self.model( ^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/nvtx/nvtx.py", line 116, in inner result = func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/song_unet.py", line 347, in forward x = block(x, emb) if isinstance(block, UNetBlock) else block(x) ^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/gideond/.conda/envs/corrdiff/lib/python3.12/site-packages/modulus/models/diffusion/layers.py", line 224, in forward x = torch.nn.functional.conv2d(x, w, padding=w_pad) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: Given groups=1, weight of size [128, 20, 3, 3], expected input[1, 16, 448, 448] to have 20 channels, but got 16 channels instead

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.

Environment details

No response

这个bug可能是来源于train.py line 127左右的N_grid_channels参数,他改变了整个模型的输入形状,当我去掉该参数,并注释掉train.py line 144对于该参数的引用之后该bug就消失了,并且output channels也应该改为cfg.dataset.out_channels = [0,1,2,3]。 This bug may come from the parametern _ grid _ channels around train.py line 127, which changed the input shape of the whole model. When I removed this parameter and commented out the reference of train.py line 144, the bug disappeared, and output channels need to changged cfg.dataset.out_channels = [0,1,2,3].

zomosky avatar Nov 01 '24 06:11 zomosky

@gideonite @windsoryin The checkpoints distributed through the inference package are not expected to be compatible with the latest physicsnemo training and generation scripts. The right way to use the checkpoints from the inference package is to directly use them in earth2studio. For example in this example. We will update the readme to clarify the point.

CharlelieLrt avatar Apr 07 '25 20:04 CharlelieLrt