MonoGS icon indicating copy to clipboard operation
MonoGS copied to clipboard

Torch matrix inversion error

Open koktavy opened this issue 11 months ago • 10 comments

I've (almost) gotten the repo working on Windows with the help of Issue 16.

When I run on the sample data (even using --eval) I hit this issue: torch._C._LinAlgError: torch.linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.

python slam.py --config configs/mono/tum/fr3_office.yaml --eval

MonoGS: Running MonoGS in Evaluation Mode
MonoGS: Following config will be overriden
MonoGS:         save_results=True
MonoGS:         use_gui=False
MonoGS:         eval_rendering=True
MonoGS:         use_wandb=True
MonoGS: saving results in results\datasets_tum\2024-03-12-13-18-30
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 3
wandb: You chose "Don't visualize my results"
wandb: Tracking run with wandb version 0.16.4
wandb: W&B syncing is set to `offline` in this directory.
wandb: Run `wandb online` or set WANDB_MODE=online to enable cloud syncing.
MonoGS: Resetting the system
MonoGS: Initialized map
Process Process-3:
Traceback (most recent call last):
  File "C:\Users\Tavius\miniconda3\envs\MonoGS\lib\multiprocessing\process.py", line 297, in _bootstrap
    self.run()
  File "C:\Users\Tavius\miniconda3\envs\MonoGS\lib\multiprocessing\process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "X:\Projects\_2024\MonoGS\utils\slam_backend.py", line 417, in run
    self.add_next_kf(cur_frame_idx, viewpoint, depth_map=depth_map)
  File "X:\Projects\_2024\MonoGS\utils\slam_backend.py", line 69, in add_next_kf
    viewpoint, kf_id=frame_idx, init=init, scale=scale, depthmap=depth_map
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 239, in extend_from_pcd_seq
    self.create_pcd_from_image(cam_info, init, scale=scale, depthmap=depthmap)
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 131, in create_pcd_from_image
    return self.create_pcd_from_image_and_depth(cam, rgb, depth, init)
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\scene\gaussian_model.py", line 150, in create_pcd_from_image_and_depth
    W2C = getWorld2View2(cam.R, cam.T).cpu().numpy()
  File "X:\Projects\_2024\MonoGS\gaussian_splatting\utils\graphics_utils.py", line 41, in getWorld2View2
    C2W = torch.linalg.inv(Rt)
torch._C._LinAlgError: torch.linalg.inv: The diagonal element 1 is zero, the inversion could not be completed because the input matrix is singular.
[W C:\cb\pytorch_1000000000000\work\torch\csrc\CudaIPCTypes.cpp:15] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

This is a fresh install using --recursive and only incorporating the change noted above.

koktavy avatar Mar 12 '24 18:03 koktavy

Also here's the batch script I used to download the data on Windows instead of Linux:

IF NOT EXIST "datasets\tum" mkdir "datasets\tum"
cd datasets\tum
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg1/rgbd_dataset_freiburg1_desk.tgz
tar -xvzf rgbd_dataset_freiburg1_desk.tgz
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg2/rgbd_dataset_freiburg2_xyz.tgz
tar -xvzf rgbd_dataset_freiburg2_xyz.tgz
curl -LJO https://vision.in.tum.de/rgbd/dataset/freiburg3/rgbd_dataset_freiburg3_long_office_household.tgz
tar -xvzf rgbd_dataset_freiburg3_long_office_household.tgz
cd ../..

Run from the root in Powershell as scripts\download_tum.bat

koktavy avatar Mar 12 '24 18:03 koktavy

me too

zmf2022 avatar Mar 13 '24 02:03 zmf2022

Hi, thank you for your interest!

Can you print out the variables R, t, Rt, in getWorld2View2 so we can check if the matrix is singular?

I suspect R,t are all zeros due to this bug, but I could be wrong: https://discuss.pytorch.org/t/pytorch-multiprocessing-with-cuda-sets-tensors-to-0/179117

rmurai0610 avatar Mar 13 '24 15:03 rmurai0610

这是来自QQ邮箱的假期自动回复邮件。您好,已收到您的邮件,将尽快给您回复!

zmf2022 avatar Mar 13 '24 15:03 zmf2022

#> Hi, thank you for your interest!

Can you print out the variables R, t, Rt, in getWorld2View2 so we can check if the matrix is singular?

I suspect R,t are all zeros due to this bug, but I could be wrong: https://discuss.pytorch.org/t/pytorch-multiprocessing-with-cuda-sets-tensors-to-0/179117

It is true. Sometimes, the orientation and translation are all zeros. I printed the inputs of getWorld2View2(R,t, .....) `w2v tensor([[-0.8280, 0.5254, -0.1956], [ 0.4139, 0.3374, -0.8455], [-0.3782, -0.7811, -0.4969]], device='cuda:0') tensor([-2.2574, 0.3327, 1.9227], device='cuda:0')

w2v tensor([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], device='cuda:0') tensor([0., 0., 0.], device='cuda:0')

w2v tensor([[-0.8280, 0.5254, -0.1956], [ 0.4139, 0.3374, -0.8455], [-0.3782, -0.7811, -0.4969]], device='cuda:0') tensor([-2.2574, 0.3327, 1.9227], device='cuda:0')

w2v tensor([[0., 0., 0.], [0., 0., 0.], [0., 0., 0.]], device='cuda:0') tensor([0., 0., 0.], device='cuda:0')`

yanyan-li avatar Mar 22 '24 12:03 yanyan-li

As a quick look, some people reported the same issue in pytorch repo, but no one seems to find a solution. This problem seems to happen only in pytorch multiprocess on Windows.

  • https://github.com/pytorch/pytorch/issues/112340
  • https://github.com/pytorch/pytorch/issues/100358

Would appreciate it if you share the solution when you find it! The last resort would be to set up an Ubuntu environment on Docker and run MonoGS.

muskie82 avatar Mar 22 '24 15:03 muskie82

Have you solve this error? I meet this error too on my Win10.

foreverlong avatar Mar 27 '24 02:03 foreverlong

add this script to disable multithreads! torch.set_num_interop_threads(1)

zmf2022 avatar Apr 07 '24 07:04 zmf2022

add this script to disable multithreads! torch.set_num_interop_threads(1)

is this the solution? can you be more specific

hnglp avatar Apr 09 '24 12:04 hnglp

add this script to disable multithreads! torch.set_num_interop_threads(1)

hi i am not sure that i understand what you say, can you tell me where should i add this line

hnglp avatar May 15 '24 14:05 hnglp