modulus-sym
modulus-sym copied to clipboard
🐛[BUG]: Modulus hangs on FNO training
Version
1.4.0
On which installation method(s) does this occur?
Pip
Describe the issue
I have adapted the FNO Darcy example to train an FNO on a shockTube example. The problem is that modulus hangs, after the .solve()
method is called. The only output I see is
python3.9/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
[14:13:14] - JitManager: {'_enabled': False, '_arch_mode': <JitArchMode.ONLY_ACTIVATION: 1>, '_use_nvfuser': True, '_autograd_nodes': False}
[14:13:14] - GraphManager: {'_func_arch': False, '_debug': False, '_func_arch_allow_partial_hessian': True}
[14:13:17] - attempting to restore from: outputs/shockTube_FNO_lazy
[14:13:17] - optimizer checkpoint not found
[14:13:17] - model fno.0.pth not found
and nothing else, no errors. The case is attached : shockTube_FNO.zip
Minimum reproducible example
"The case is attached in the issue"
Relevant log output
python3.9/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information.
ret = run_job(
[14:13:14] - JitManager: {'_enabled': False, '_arch_mode': <JitArchMode.ONLY_ACTIVATION: 1>, '_use_nvfuser': True, '_autograd_nodes': False}
[14:13:14] - GraphManager: {'_func_arch': False, '_debug': False, '_func_arch_allow_partial_hessian': True}
[14:13:17] - attempting to restore from: outputs/shockTube_FNO_lazy
[14:13:17] - optimizer checkpoint not found
[14:13:17] - model fno.0.pth not found
Environment details
No response
Other/Misc.
No response