physicsnemo icon indicating copy to clipboard operation
physicsnemo copied to clipboard

🐛[BUG]: @StaticCaptureEvaluateNoGrad decorator can cause NaN values to show up during inference

Open jerrylin96 opened this issue 7 months ago • 1 comments

Version

24.01

On which installation method(s) does this occur?

Source

Describe the issue

Inferencing my squeezeformer using the @StaticCaptureEvaluateNoGrad decorator can cause NaN values. When inferencing with default pytorch syntax, I do not encounter any such problem.

@StaticCaptureEvaluateNoGrad(model=model, use_graphs=False)
def eval_step_forward(my_model, invar):
    return my_model(invar)
...
In [19]: output_modulus = eval_step_forward(model, data_input)

In [20]: torch.isnan(output_modulus).any()
Out[20]: tensor(True, device='cuda:0')

In [21]: with torch.no_grad():
    ...:     output_pytorch = model(data_input)
    ...: 

In [22]: torch.isnan(output_pytorch).any()
Out[22]: tensor(False, device='cuda:0')

Minimum reproducible example


Relevant log output


Environment details

Using a container on NERSC perlmutter:

#SBATCH --image=nvcr.io/nvidia/modulus/modulus:24.01

jerrylin96 avatar May 05 '25 22:05 jerrylin96

Is it possible to try running the same code using the most recent version of the PhysicsNeMo Docker (nvcr.io/nvidia/physicsnemo/physicsnemo:25.03)?

Alexey-Kamenev avatar May 09 '25 21:05 Alexey-Kamenev

@jerrylin96 , can you try @Alexey-Kamenev suggestion?

Otherwise provide a minimum reproducible example with environment details.

dallasfoster avatar Jul 18 '25 17:07 dallasfoster

Closing as not response received. Please reopen if this is still seen with the latest version.

prem-krishnan avatar Sep 26 '25 18:09 prem-krishnan