physicsnemo
physicsnemo copied to clipboard
🐛[BUG]: @StaticCaptureEvaluateNoGrad decorator can cause NaN values to show up during inference
Version
24.01
On which installation method(s) does this occur?
Source
Describe the issue
Inferencing my squeezeformer using the @StaticCaptureEvaluateNoGrad decorator can cause NaN values. When inferencing with default pytorch syntax, I do not encounter any such problem.
@StaticCaptureEvaluateNoGrad(model=model, use_graphs=False)
def eval_step_forward(my_model, invar):
return my_model(invar)
...
In [19]: output_modulus = eval_step_forward(model, data_input)
In [20]: torch.isnan(output_modulus).any()
Out[20]: tensor(True, device='cuda:0')
In [21]: with torch.no_grad():
...: output_pytorch = model(data_input)
...:
In [22]: torch.isnan(output_pytorch).any()
Out[22]: tensor(False, device='cuda:0')
Minimum reproducible example
Relevant log output
Environment details
Using a container on NERSC perlmutter:
#SBATCH --image=nvcr.io/nvidia/modulus/modulus:24.01
Is it possible to try running the same code using the most recent version of the PhysicsNeMo Docker (nvcr.io/nvidia/physicsnemo/physicsnemo:25.03)?
@jerrylin96 , can you try @Alexey-Kamenev suggestion?
Otherwise provide a minimum reproducible example with environment details.
Closing as not response received. Please reopen if this is still seen with the latest version.