pixloc icon indicating copy to clipboard operation
pixloc copied to clipboard

untimeError: torch.linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite)

Open patelajaychh opened this issue 4 years ago • 7 comments

Command Run - $ python -m pixloc.run_Aachen Python Version - 3.8 Torch version - 1.10.0

[11/08/2021 20:32:50 pixloc.localization.model3d INFO] Reading COLMAP model /home/ajay/pixloc/outputs/hloc/Aachen/sfm_superpoint+superglue. [11/08/2021 20:33:03 pixloc.utils.io INFO] Imported 824 images from day_time_queries_with_intrinsics.txt [11/08/2021 20:33:03 pixloc.utils.io INFO] Imported 98 images from night_time_queries_with_intrinsics.txt [11/08/2021 20:33:03 pixloc.pixlib.utils.experiments INFO] Loading checkpoint checkpoint_best.tar [11/08/2021 20:33:07 pixloc.localization.localizer INFO] Starting the localization process... 4%|███▉ | 40/922 [01:30<33:06, 2.25s/it] Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ajay/pixloc/pixloc/run_Aachen.py", line 76, in main() File "/home/ajay/pixloc/pixloc/run_Aachen.py", line 68, in main poses, logs = localizer.run_batched(skip=args.skip) File "/home/ajay/pixloc/pixloc/localization/localizer.py", line 82, in run_batched ret = self.run_query(name, camera) File "/home/ajay/pixloc/pixloc/localization/localizer.py", line 131, in run_query ret = self.refiner.refine(name, camera, dbs, loc=loc) File "/home/ajay/pixloc/pixloc/localization/refiners.py", line 109, in refine ret = self.refine_query_pose(qname, qcamera, T_init, p3did_to_dbids, File "/home/ajay/pixloc/pixloc/localization/base_refiner.py", line 179, in refine_query_pose ret = self.refine_pose_using_features(features_query, scales_query, File "/home/ajay/pixloc/pixloc/localization/base_refiner.py", line 117, in refine_pose_using_features T_opt, fail = opt.run(p3d, F_ref, F_q, T_i.to(F_q), File "/home/ajay/pixloc/pixloc/utils/tools.py", line 48, in wrapped rets = func(*args_converted, **kwargs) File "/home/ajay/pixloc/pixloc/pixlib/models/base_optimizer.py", line 101, in run return self.run(*args, **kwargs) File "/home/ajay/pixloc/pixloc/pixlib/models/learned_optimizer.py", line 78, in run delta = optimizer_step(g, H, lambda, mask=~failed) File "/home/ajay/pixloc/pixloc/pixlib/geometry/optimization.py", line 29, in optimizer_step U = torch.linalg.cholesky(H, upper=True) RuntimeError: torch.linalg.cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 1 is not positive-definite).

patelajaychh avatar Nov 08 '21 07:11 patelajaychh

I have the same Issue.

sjg02122 avatar Nov 22 '21 05:11 sjg02122

I have the same Issue.

@sjg02122 I got this resolved by downgrading torch version to torch=1.7

patelajaychh avatar Nov 23 '21 04:11 patelajaychh

I have indeed only tested PyTorch 1.7 thinking that following versions would be fine. I am right now unable to install >=1.8 due to my CUDA install. Can you check if this is due to NaNs in the input matrix? if not, you could try to increase the eps value in https://github.com/cvg/pixloc/blob/90f7e968398252e8557b284803ee774cb8d80cd0/pixloc/pixlib/geometry/optimization.py#L7

sarlinpe avatar Nov 23 '21 17:11 sarlinpe

I cannot manage to reproduce this error on GPU with torch=1.10.0 and CUDA=10.2. Are you running this on CPU? Could you check if there is any NaN in the input?

sarlinpe avatar Nov 23 '21 21:11 sarlinpe

I cannot manage to reproduce this error on GPU with torch=1.10.0 and CUDA=10.2. Are you running this on CPU? Could you check if there is any NaN in the input?

I'm running this on GPU only. Talking about NaN in input, I'm not sure because I ran python -m pixloc.run_Aachen command which I think shouldn't contain any NaN.

Anyway, code worked with torch=1.7 so I didn't investigated any more.

patelajaychh avatar Nov 29 '21 11:11 patelajaychh

I have this issue as well. There is a check implemented that seems to relate to the problem: https://github.com/cvg/pixloc/blob/8f253dcd1f1d7cbe1bc4f1cb4ee628b1bd344fcb/pixloc/pixlib/geometry/optimization.py#L34-L45 However, the error message obtained when cholesky fails in torch=1.10 is not the same as in torch=1.7. Running

python -c "import torch; H = torch.ones([2,2]); U = torch.cholesky(H)"

with torch=1.7.0 gives RuntimeError: cholesky_cpu: U(2,2) is zero, singular U. while torch=1.10.1 gives RuntimeError: cholesky: The factorization could not be completed because the input is not positive-definite (the leading minor of order 2 is not positive-definite).

georg-bn avatar Jan 20 '22 08:01 georg-bn

I have this issue as well. Python Version - 3.8 Torch version - 1.11.0

Because my GPU is A100, torch cannot be downgrade. How do I solve this problem.

kyrie-23 avatar May 27 '22 09:05 kyrie-23