pykan
pykan copied to clipboard
Runtime Error in hellokan.ipynb
I met the same issue that appeared in the previously closed issues #117 #46 #89
File ...\pykan\kan\spline.py:135, in curve2coef(x_eval, y_eval, grid, k, device)
133 # x_eval: (size, batch); y_eval: (size, batch); grid: (size, grid); k: scalar
134 mat = B_batch(x_eval, grid, k, device=device).permute(0, 2, 1)
--> 135 coef = torch.linalg.lstsq(mat.to('cpu'), y_eval.unsqueeze(dim=2).to('cpu')).solution[:, :, 0] # sometimes 'cuda' version may diverge
136 return coef.to(device)
RuntimeError: false INTERNAL ASSERT FAILED at "...\pytorch\\pytorch\\builder\\windows\\pytorch\\aten\\src\\ATen\\native\\BatchLinearAlgebra.cpp":1540, please report a bug to PyTorch. torch.linalg.lstsq: (Batch element 0): Argument 6 has illegal value. Most certainly there is a bug in the implementation calling the backend library.
Changing the opt to 'Adam' did not solve the problem.
It seems that the driver in torch.linalg.lstsq should be specifically claimed instead of using 'None'.
Either 'LBFGS' or 'Adam' works when I use the following if statement in curve2coef though I don't know why.
if device == 'cpu':
coef = torch.linalg.lstsq(mat.to(device), y_eval.unsqueeze(dim=2).to(device),driver = 'gelsy').solution[:, :, 0]
else:
coef = torch.linalg.lstsq(mat.to(device), y_eval.unsqueeze(dim=2).to(device),driver = 'gels').solution[:, :, 0] # sometimes 'cuda' version may diverge
This seems promising, I'll test it too! Can you explain the difference between driver = 'gelsy'
and driver = 'gels'
?
Also, probably that if block can be rewritten as
coef = torch.linalg.lstsq(mat.to(device), y_eval.unsqueeze(dim=2).to(device),driver = 'gelsy' if device == 'cpu' else 'gels').solution[:, :, 0]
Honestly, I don't know in detail. According to the torch.linalg.lstsq document, gelsy is a general QR factorization to solve least-squares using CPU. gels assumes the matrix is full rank. For cuda, gels is the only choice.
Something weird is that if the driver is set to None, as what the original code did, the driver should be automatically set to either gelsy or gels depending on the device setting. Then, the issue appears.
Well I think we can easily make a PR for this annoying bug with your suggestion (maybe just make it a oneliner as described above). If you're unsure how to do that, I'll take care of it.
Sure, please go ahead. I'm new to the GitHub collaboration. Glad this could help.