pytorch-minimize icon indicating copy to clipboard operation
pytorch-minimize copied to clipboard

LSMR fails with NaN output for trivial problems

Open tvercaut opened this issue 3 years ago • 2 comments

Thanks for the nice library. I wanted to try LSMR but my first attempt to use it with a trivial problem failed with NaN output.

Steps to reproduce:

import torch
import torchmin

A = torch.eye(10)
xtrue = torch.zeros((10, 1))
b = A @ xtrue

x = torchmin.lstsq.lsmr.lsmr(A,b)[0]
print(x)

which resulted in

tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan])

instead of 0s

tvercaut avatar Dec 02 '22 23:12 tvercaut

Note that the same holds with a slightly less trivial but still trivial case:

import torch
import torchmin

A = torch.eye(10)
xtrue = torch.ones((10, 1))
b = A @ xtrue

x = torchmin.lstsq.lsmr.lsmr(A,b)[0]
print(x)

Comparing the source code with that of scipy points to the following issues.

normr and normar are initialiased differently and do not include the "convergence" tests at the very beginning https://github.com/rfeinman/pytorch-minimize/blob/1017e9732db83fc9ffa2cb6e8316ed2bb0682d6a/torchmin/lstsq/lsmr.py#L148-L149 instead of https://github.com/scipy/scipy/blob/2347d9309fbbabb1d3f89d35b7a42d0d53f002b2/scipy/sparse/linalg/_isolve/lsmr.py#L290-L307

Within the main iteration loop, the convergence test is only done every 10 iterations https://github.com/rfeinman/pytorch-minimize/blob/1017e9732db83fc9ffa2cb6e8316ed2bb0682d6a/torchmin/lstsq/lsmr.py#L257-L259 instead of every time https://github.com/scipy/scipy/blob/2347d9309fbbabb1d3f89d35b7a42d0d53f002b2/scipy/sparse/linalg/_isolve/lsmr.py#L409

tvercaut avatar Dec 02 '22 23:12 tvercaut

For the record, I made a few changes here: https://github.com/cai4cai/torchsparsegradutils/blob/main/torchsparsegradutils/utils/lsmr.py

tvercaut avatar Sep 26 '23 21:09 tvercaut