modulus icon indicating copy to clipboard operation
modulus copied to clipboard

LBFGS optimizer doesn't work for PINN training 🐛[BUG]:

Open hasethinvd opened this issue 9 months ago • 1 comments

Version

24.01

On which installation method(s) does this occur?

Docker, Pip, Source

Describe the issue

After specifying the optimizer to be bfgs in config file, it overrides the max_steps to 0

Minimum reproducible example

#config
defaults :
  - modulus_default
  - arch:
      - fourier
      - modified_fourier
      - fully_connected
      - multiscale_fourier
  - scheduler: tf_exponential_lr
  - optimizer: bfgs
  - loss: sum


training:
  rec_results_freq: 1000
  max_steps : 150000

Relevant log output

[23:53:04] - lbfgs optimizer selected. Setting max_steps to 0
[23:53:05] - [step:     100000] lbfgs optimization in running
Error executing job with overrides: []
Traceback (most recent call last):
  File "/mount/data/test/eikonal/eikonal.py", line 313, in run
    slv.solve()
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/solver/solver.py", line 173, in solve
    self._train_loop(sigterm_handler)
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 543, in _train_loop
    loss, losses = self._cuda_graph_training_step(step)
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 730, in _cuda_graph_training_step
    self.apply_gradients()
  File "/usr/local/lib/python3.10/dist-packages/modulus/sym/trainer.py", line 185, in bfgs_apply_gradients
    self.optimizer.step(self.bfgs_closure_func)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lr_scheduler.py", line 68, in wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/optimizer.py", line 379, in wrapper
    out = func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/optim/lbfgs.py", line 298, in step
    max_iter = group['max_iter']
KeyError: 'max_iter'

Environment details

No response

hasethinvd avatar May 09 '24 23:05 hasethinvd