gaussian-splatting icon indicating copy to clipboard operation
gaussian-splatting copied to clipboard

Add depth visualization

Open ingra14m opened this issue 1 year ago • 53 comments

ingra14m avatar Jul 13 '23 14:07 ingra14m

this seems to depend on this: https://github.com/graphdeco-inria/gaussian-splatting/pull/20 ?

grgkopanas avatar Jul 13 '23 15:07 grgkopanas

This is depend on the https://github.com/graphdeco-inria/diff-gaussian-rasterization/pull/3

ingra14m avatar Jul 13 '23 16:07 ingra14m

May I confirm if the current branch does not include the capability to render depth?

JIANG-CX avatar Aug 08 '23 05:08 JIANG-CX

The render part can be seen in #5 in diff-gaussian-rasterization

ingra14m avatar Aug 15 '23 09:08 ingra14m

The render part can be seen in #5 in diff-gaussian-rasterization

Thanks for your reply. I have read your proposed code modification, which involves changing the alpha-blending process from ∑Tαc to ∑Tαd. However, I have noticed that there are instances where there are only a few points with a large covariance on a uniformly colored floor or wall. It is possible that using this method in such areas could result in significant errors.

JIANG-CX avatar Aug 15 '23 09:08 JIANG-CX

There are indeed many artifacts in the depth maps of 3D-GS. I tried to improve the geometry of 3D-GS by adding depth loss in #5 in diff-gaussian-rasterization, which might enhance the rendering effects of 3D-GS. However, experimental results show that improving the geometry does not substantially improve the rendering quality. I personally believe that the depth issues of 3D-GS are inevitable.

ingra14m avatar Aug 16 '23 07:08 ingra14m

@ingra14m Thank you very much for your implementation of depth forward and direction, but when I pull your code and then try to do depth supervision, a memory leak occurs at a random step of the training iteration:

Training progress:   4%|█████▏                                                                                                                                 | 1140/30000 [00:27<11:08, 43.16it/s, Loss=0.1069719]
  Traceback (most recent call last):
    File "train.py", line 301, in <module>
      training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.depth_loss_choice)
    File "train.py", line 131, in training
      ema_loss_for_log = 0.4 * loss.item() + 0.6 * ema_loss_for_log
  RuntimeError: CUDA error: an illegal memory access was encountered
  CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
  For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

What can be determined is that the error has nothing to do with the location where the error is reported:

ema_loss_for_log = 0.4 * loss.item() + 0.6 * ema_loss_for_log

When using the original rasterizer implementation, this does not exist. so the problem is in the depth gradient calculation part of your implementation, I don't know what's going on, are you facing a similar problem again?

Bin-ze avatar Sep 13 '23 08:09 Bin-ze

There are indeed many artifacts in the depth maps of 3D-GS. I tried to improve the geometry of 3D-GS by adding depth loss in #5 in diff-gaussian-rasterization, which might enhance the rendering effects of 3D-GS. However, experimental results show that improving the geometry does not substantially improve the rendering quality. I personally believe that the depth issues of 3D-GS are inevitable.

Replenish:

 python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.0
Libc version: glibc-2.17

Python version: 3.7.13 (default, Oct 18 2022, 18:57:03)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-69-generic-x86_64-with-debian-bullseye-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090

Nvidia driver version: 525.105.17
cuDNN version: Probably one of the following:
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.6.2              hfc3e2af_12    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h8d4b97c_729    conda-forge
[conda] mkl-service               2.4.0            py37h402132d_0    conda-forge
[conda] mkl_fft                   1.3.1            py37h3e078e5_1    conda-forge
[conda] mkl_random                1.2.2            py37h219a48f_0    conda-forge
[conda] numpy                     1.21.5                   pypi_0    pypi
[conda] pytorch                   1.12.1          py3.7_cuda11.6_cudnn8.3.2_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.12.1               py37_cu116    pytorch
[conda] torchvision               0.13.1               py37_cu116    pytorch

error:

File "train.py", line 137, in training
    loss.backward()
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: an illegal memory access was encountered
Training progress:   2%|█▌                                                                          | 610/30000 [00:03<02:35, 189.05it/s, Loss=0.5014731]

Bin-ze avatar Sep 13 '23 14:09 Bin-ze

There are indeed many artifacts in the depth maps of 3D-GS. I tried to improve the geometry of 3D-GS by adding depth loss in #5 in diff-gaussian-rasterization, which might enhance the rendering effects of 3D-GS. However, experimental results show that improving the geometry does not substantially improve the rendering quality. I personally believe that the depth issues of 3D-GS are inevitable.

Replenish:

 python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.0
Libc version: glibc-2.17

Python version: 3.7.13 (default, Oct 18 2022, 18:57:03)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-69-generic-x86_64-with-debian-bullseye-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090

Nvidia driver version: 525.105.17
cuDNN version: Probably one of the following:
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.6.2              hfc3e2af_12    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h8d4b97c_729    conda-forge
[conda] mkl-service               2.4.0            py37h402132d_0    conda-forge
[conda] mkl_fft                   1.3.1            py37h3e078e5_1    conda-forge
[conda] mkl_random                1.2.2            py37h219a48f_0    conda-forge
[conda] numpy                     1.21.5                   pypi_0    pypi
[conda] pytorch                   1.12.1          py3.7_cuda11.6_cudnn8.3.2_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.12.1               py37_cu116    pytorch
[conda] torchvision               0.13.1               py37_cu116    pytorch

error:

File "train.py", line 137, in training
    loss.backward()
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: an illegal memory access was encountered
Training progress:   2%|█▌                                                                          | 610/30000 [00:03<02:35, 189.05it/s, Loss=0.5014731]

That problem was fixed in the latest version of diff-gaussian-rasterization. If you want to visualize the depth in the forward pass, you can refer to my another branch latest and reset to commit with the comment "Add depth forward pass" in the following picture. The latest commit of that branch containing visualization of the acc and depth. image

ingra14m avatar Sep 14 '23 06:09 ingra14m

There are indeed many artifacts in the depth maps of 3D-GS. I tried to improve the geometry of 3D-GS by adding depth loss in #5 in diff-gaussian-rasterization, which might enhance the rendering effects of 3D-GS. However, experimental results show that improving the geometry does not substantially improve the rendering quality. I personally believe that the depth issues of 3D-GS are inevitable.

Replenish:

 python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.0
Libc version: glibc-2.17

Python version: 3.7.13 (default, Oct 18 2022, 18:57:03)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-69-generic-x86_64-with-debian-bullseye-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090

Nvidia driver version: 525.105.17
cuDNN version: Probably one of the following:
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.6.2              hfc3e2af_12    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h8d4b97c_729    conda-forge
[conda] mkl-service               2.4.0            py37h402132d_0    conda-forge
[conda] mkl_fft                   1.3.1            py37h3e078e5_1    conda-forge
[conda] mkl_random                1.2.2            py37h219a48f_0    conda-forge
[conda] numpy                     1.21.5                   pypi_0    pypi
[conda] pytorch                   1.12.1          py3.7_cuda11.6_cudnn8.3.2_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.12.1               py37_cu116    pytorch
[conda] torchvision               0.13.1               py37_cu116    pytorch

error:

File "train.py", line 137, in training
    loss.backward()
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: an illegal memory access was encountered
Training progress:   2%|█▌                                                                          | 610/30000 [00:03<02:35, 189.05it/s, Loss=0.5014731]

That problem was fixed in the latest version of diff-gaussian-rasterization. If you want to visualize the depth in the forward pass, you can refer to my another branch latest and reset to commit with the comment "Add depth forward pass" in the following picture. The latest commit of that branch containing visualization of the acc and depth. image

Thank you for your response! , I tried pulling the latest latest branch:

git checkout latest Then re-run the training but get an error:

Training progress: 0%| | 0/30000 [00:00<?, ?it/s] Traceback (most recent call last): File "train.py", line 304, in training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.depth_loss_choice) File "train.py", line 92, in training render_pkg = render(viewpoint_cam, gaussians, pipe, background) File "/home/guozebin/work_code/3d_gaussian_magic_change/gaussian_renderer/init.py", line 266, in render cov3D_precomp=cov3D_precomp) ValueError: too many values to unpack (expected 3) Training progress: 0%|

Your implementation also seems to conflict with the official implementation of 3D-GS on the output of the rasterizer. Can you tell me how to solve it?

Bin-ze avatar Sep 14 '23 10:09 Bin-ze

There are indeed many artifacts in the depth maps of 3D-GS. I tried to improve the geometry of 3D-GS by adding depth loss in #5 in diff-gaussian-rasterization, which might enhance the rendering effects of 3D-GS. However, experimental results show that improving the geometry does not substantially improve the rendering quality. I personally believe that the depth issues of 3D-GS are inevitable.

Replenish:

 python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.0
Libc version: glibc-2.17

Python version: 3.7.13 (default, Oct 18 2022, 18:57:03)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-69-generic-x86_64-with-debian-bullseye-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090

Nvidia driver version: 525.105.17
cuDNN version: Probably one of the following:
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.6.2              hfc3e2af_12    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h8d4b97c_729    conda-forge
[conda] mkl-service               2.4.0            py37h402132d_0    conda-forge
[conda] mkl_fft                   1.3.1            py37h3e078e5_1    conda-forge
[conda] mkl_random                1.2.2            py37h219a48f_0    conda-forge
[conda] numpy                     1.21.5                   pypi_0    pypi
[conda] pytorch                   1.12.1          py3.7_cuda11.6_cudnn8.3.2_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.12.1               py37_cu116    pytorch
[conda] torchvision               0.13.1               py37_cu116    pytorch

error:

File "train.py", line 137, in training
    loss.backward()
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: an illegal memory access was encountered
Training progress:   2%|█▌                                                                          | 610/30000 [00:03<02:35, 189.05it/s, Loss=0.5014731]

That problem was fixed in the latest version of diff-gaussian-rasterization. If you want to visualize the depth in the forward pass, you can refer to my another branch latest and reset to commit with the comment "Add depth forward pass" in the following picture. The latest commit of that branch containing visualization of the acc and depth. image

Thank you for your response! , I tried pulling the latest latest branch:

git checkout latest Then re-run the training but get an error:

Training progress: 0%| | 0/30000 [00:00<?, ?it/s] Traceback (most recent call last): File "train.py", line 304, in training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.depth_loss_choice) File "train.py", line 92, in training render_pkg = render(viewpoint_cam, gaussians, pipe, background) File "/home/guozebin/work_code/3d_gaussian_magic_change/gaussian_renderer/init.py", line 266, in render cov3D_precomp=cov3D_precomp) ValueError: too many values to unpack (expected 3) Training progress: 0%|

Your implementation also seems to conflict with the official implementation of 3D-GS on the output of the rasterizer. Can you tell me how to solve it?

You can git reset --hard 4a3f789 to the commit that only has the forward depth pass.

ingra14m avatar Sep 14 '23 14:09 ingra14m

when I was running depth supervision on your modified code, I also faced a similar problem about Ilegeal memory was encountered.

It seems like the problem in the code is the depths computation of const float c_d = collected_depths[j]; in cuda_rasterizer/backward.cu. And when querying the array collected_depths at some iterations after, the error occurred. I also noticed that you have set the c_d = 1 when performing backpropagation. But I don't know why the error happened. Is there any effective solution to that problem?

guanjunwu avatar Sep 14 '23 20:09 guanjunwu

There are indeed many artifacts in the depth maps of 3D-GS. I tried to improve the geometry of 3D-GS by adding depth loss in #5 in diff-gaussian-rasterization, which might enhance the rendering effects of 3D-GS. However, experimental results show that improving the geometry does not substantially improve the rendering quality. I personally believe that the depth issues of 3D-GS are inevitable.

Replenish:

 python -m torch.utils.collect_env
Collecting environment information...
PyTorch version: 1.12.1
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
Clang version: Could not collect
CMake version: version 3.22.0
Libc version: glibc-2.17

Python version: 3.7.13 (default, Oct 18 2022, 18:57:03)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.15.0-69-generic-x86_64-with-debian-bullseye-sid
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 4090
GPU 1: NVIDIA GeForce RTX 4090

Nvidia driver version: 525.105.17
cuDNN version: Probably one of the following:
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.7.0
/usr/local/cuda-11.8/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.7.0
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.12.1
[pip3] torchaudio==0.12.1
[pip3] torchvision==0.13.1
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               11.6.2              hfc3e2af_12    conda-forge
[conda] ffmpeg                    4.3                  hf484d3e_0    pytorch
[conda] mkl                       2021.4.0           h8d4b97c_729    conda-forge
[conda] mkl-service               2.4.0            py37h402132d_0    conda-forge
[conda] mkl_fft                   1.3.1            py37h3e078e5_1    conda-forge
[conda] mkl_random                1.2.2            py37h219a48f_0    conda-forge
[conda] numpy                     1.21.5                   pypi_0    pypi
[conda] pytorch                   1.12.1          py3.7_cuda11.6_cudnn8.3.2_0    pytorch
[conda] pytorch-mutex             1.0                        cuda    pytorch
[conda] torchaudio                0.12.1               py37_cu116    pytorch
[conda] torchvision               0.13.1               py37_cu116    pytorch

error:

File "train.py", line 137, in training
    loss.backward()
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/miniconda3/envs/3d_gaussian_depth/lib/python3.7/site-packages/torch/autograd/__init__.py", line 175, in backward
    allow_unreachable=True, accumulate_grad=True)  # Calls into the C++ engine to run the backward pass
RuntimeError: CUDA error: an illegal memory access was encountered
Training progress:   2%|█▌                                                                          | 610/30000 [00:03<02:35, 189.05it/s, Loss=0.5014731]

That problem was fixed in the latest version of diff-gaussian-rasterization. If you want to visualize the depth in the forward pass, you can refer to my another branch latest and reset to commit with the comment "Add depth forward pass" in the following picture. The latest commit of that branch containing visualization of the acc and depth. image

Thank you for your response! , I tried pulling the latest latest branch:

git checkout latest Then re-run the training but get an error:

Training progress: 0%| | 0/30000 [00:00<?, ?it/s] Traceback (most recent call last): File "train.py", line 304, in training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from, args.depth_loss_choice) File "train.py", line 92, in training render_pkg = render(viewpoint_cam, gaussians, pipe, background) File "/home/guozebin/work_code/3d_gaussian_magic_change/gaussian_renderer/init.py", line 266, in render cov3D_precomp=cov3D_precomp) ValueError: too many values to unpack (expected 3) Training progress: 0%| Your implementation also seems to conflict with the official implementation of 3D-GS on the output of the rasterizer. Can you tell me how to solve it?

You can git reset --hard 4a3f789 to the commit that only has the forward depth pass.

Thanks for your reply, but I want to back-transmit the depth gradient for depth supervision, so can you tell me how you solved the memory leak problem to get the results after depth supervision?

when I using latest branch and change rasteriser return: rendered_image, radii, depth, acc, Training begins,but the same error occurred:

  RuntimeError: CUDA error: an illegal memory access was encountered
  CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
  For debugging consider passing CUDA_LAUNCH_BLOCKING=1

There is a memory leak in the calculation of DL_Ddepth but I can't solve it, can you give me some helpful suggestions?

Bin-ze avatar Sep 15 '23 01:09 Bin-ze

calculation

This problem is solved by official diff-gaussian-rasterization. The 'latest' branch is build on the latest official diff-gaussian-rasterization. The latest commit of the latest added acc visualization. So if you want to back-transmit the depth gradient for depth supervision, there are two choices:

  1. git reset --hard d26ea87
  2. no git reset and add another variable to receive acc. For example,
rendered_image, radii, depth, acc = rasterizer(
        means3D = means3D,
        means2D = means2D,
        shs = shs,
        colors_precomp = colors_precomp,
        opacities = opacity,
        scales = scales,
        rotations = rotations,
        cov3D_precomp = cov3D_precomp)

ingra14m avatar Sep 15 '23 02:09 ingra14m

performing

I think this memory problem is solved by official 3D GS. You can refer to my another branch, this branch build upon the latest diff-gaussian-rasterization. Since I add acc visualization, you can git reset --hard d26ea87 to the commit that only contains forward and backward depth pass.

ingra14m avatar Sep 15 '23 02:09 ingra14m

I used the second solution when the problem mentioned above occurred, and the memory leak problem was still visible in my experiments. step:

  1. clon code : git clone [email protected]:ingra14m/diff-gaussian-rasterization.git
  2. checkout latest branch: git checkout latest
  3. make: pip install -v -e .
  4. add another variable to receive acc like you
  5. train but error occurred:

RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1

Bin-ze avatar Sep 15 '23 02:09 Bin-ze

I recommend you to reset, because the recent commits related to acc have not been strictly checked.

ingra14m avatar Sep 15 '23 03:09 ingra14m

I recommend you to reset, because the recent commits related to acc have not been strictly checked.

Frustrating that it still happens after a reset, what should I do?

Bin-ze avatar Sep 15 '23 05:09 Bin-ze

@ingra14m Sorry to bother you! Is there any progress on this issue?

Bin-ze avatar Sep 22 '23 13:09 Bin-ze

sorry for late reply. Could you provide me with the dataset you used?

ingra14m avatar Sep 25 '23 06:09 ingra14m

@ingra14m can't provide the dataset, but exact same error happens to me as well. Will attaching snapshots help?

VladimirYugay avatar Sep 25 '23 06:09 VladimirYugay

It would be definitely great if you could provide a snapshots of the errors.

ingra14m avatar Sep 25 '23 07:09 ingra14m

@ingra14m can't provide the dataset, but exact same error happens to me as well. Will attaching snapshots help?

Thank you very much for your reply! In my test, all data sets will have memory leakage errors. My depth comes from the monocular depth estimation algorithm. I have tried a lot of data sets, including mipinerf-360 scenarios. I will upload the dpeth of the data set to the cloud disk tomorrow. I am looking forward to your experimental results.! Thank you again!

On the other hand, I got a snapshot of the error and turned it on. He checked the output and input, but no abnormality was found.

Bin-ze avatar Sep 25 '23 15:09 Bin-ze

sorry for late reply. Could you provide me with the dataset you used?

Here is a very simple custom scene I made and used to test deep supervision: https://drive.google.com/file/d/1Gwny9MawwzZ4PutD3I0ANolW89Et1Dxs/view?usp=sharing

Bin-ze avatar Sep 26 '23 01:09 Bin-ze

got it. I'll check it right away

ingra14m avatar Sep 26 '23 02:09 ingra14m

sorry for late reply. Could you provide me with the dataset you used?

Here is a very simple custom scene I made and used to test deep supervision: https://drive.google.com/file/d/1Gwny9MawwzZ4PutD3I0ANolW89Et1Dxs/view?usp=sharing

Thank you for your patience. I have completed the testing of your data. Fortunately, your data can be executed on my end, and the rendering results with depth are evidently much superior to those without depth. Here is my core code:

depth = render_pkg_re["depth"]

# Just make the depth_gt and the gaussian depth with the same dimension
depth_image = depth_image.reshape(1, 1, *depth_image.shape)
depth_gt = F.interpolate(fid, size=(900, 1600), mode='bilinear', align_corners=False)[0]  # match the scaling process of official 3D Gaussian

depth_gt = depth_gt / depth_gt.max()  # make the depth_gt ranging from 0-1
loss = (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image)) + l1_loss(depth_gt, depth) * 0.1

As for the Differential Gaussian Rasterization, I did as I said before (I have made a depth branch):

git clone https://github.com/ingra14m/diff-gaussian-rasterization
cd diff-gaussian-rasterization
git checkout depth
pip install .

Here is the results from me:

I read the data as colmap format, and no --eval. It seems that there are 6 training images

with depth supervision image The mean PSNR of training dataset is 40.29. The training time is 23min35s on Tesla V100.

without depth supervision image The mean PSNR of training dataset is 33.16. The training time is 23min33s.

ingra14m avatar Sep 26 '23 09:09 ingra14m

sorry for late reply. Could you provide me with the dataset you used?

Here is a very simple custom scene I made and used to test deep supervision: https://drive.google.com/file/d/1Gwny9MawwzZ4PutD3I0ANolW89Et1Dxs/view?usp=sharing

Thank you for your patience. I have completed the testing of your data. Fortunately, your data can be executed on my end, and the rendering results with depth are evidently much superior to those without depth. Here is my core code:

depth = render_pkg_re["depth"]

# Just make the depth_gt and the gaussian depth with the same dimension
depth_image = depth_image.reshape(1, 1, *depth_image.shape)
depth_gt = F.interpolate(fid, size=(900, 1600), mode='bilinear', align_corners=False)[0]  # match the scaling process of official 3D Gaussian

depth_gt = depth_gt / depth_gt.max()  # make the depth_gt ranging from 0-1
loss = (1.0 - opt.lambda_dssim) * Ll1 + opt.lambda_dssim * (1.0 - ssim(image, gt_image)) + l1_loss(depth_gt, depth) * 0.1

As for the Differential Gaussian Rasterization, I did as I said before (I have made a depth branch):

git clone https://github.com/ingra14m/diff-gaussian-rasterization
cd diff-gaussian-rasterization
git checkout depth
pip install .

Here is the results from me:

I read the data as colmap format, and no --eval. It seems that there are 6 training images

with depth supervision image The mean PSNR of training dataset is 40.29. The training time is 23min35s on Tesla V100.

without depth supervision image The mean PSNR of training dataset is 33.16. The training time is 23min33s.

Thank you very much for your reply! Very excited to hear this news too But I have a little doubt: My gt depth is a disparity map. Have you processed it? Otherwise, the current supervision is just opposite to the optimization goal. What is your machine model? Did you encounter a memory leak issue recently and resolved it in your latest commit? I will follow the guidelines you provided to re-run the in-depth supervision and report back the results. I look forward to your reply!

Bin-ze avatar Sep 26 '23 10:09 Bin-ze

I didn't do anything special. I just read the corresponding depth in your depth directory. I think if there's anything I can call special, it's probably my depth normalization depth_gt = depth_gt / depth_gt.max() # make the depth_gt ranging from 0-1. My machine's operating system is Ubuntu 20.04 & Debian with Tesla V100. It can be run on both machines. As for the Differential Gaussian Rasterization, I reconfigured the environment and I can confirm that this is the final version of my depth branch

ingra14m avatar Sep 26 '23 11:09 ingra14m

I quickly tried it, but as always, I couldn't perform the full training anyway, and the error kept appearing randomly in the middle of the training:

RuntimeError: CUDA error: an illegal memory access was encountered Training progress: 8%|███████████████████▊ | 2530/30000 [01:17<14:06, 32.45it/s, Loss=0.2167022]

I thought I might have missed some key step, but I did install the rasterizer as you asked: git clone https://github.com/ingra14m/diff-gaussian-rasterization.git git checkout depth pip install .

then:

CUDA_VISIBLE_DEVICES=2 python train.py -s data/test_depth_super -r 4

I did some more tests, and I found that if the gap between gt and pred is very large when calculating loss, corresponding to the situation where gt is not processed, a memory leak will quickly occur, so I think the problem is gradient explosion. caused. In the case of inputting real gt, this problem does not exist, because after normalization, the difference between the two is very small. Thank you very much for your reply. Now I realize that it is not a problem with your implementation, but is it possible to clamp the return gradient to ensure that it is limited to a reasonable range and avoid outliers caused by inaccurate deep supervision?

Bin-ze avatar Sep 26 '23 11:09 Bin-ze

Cloning from the original 3DGS repo and doing: git clone https://github.com/ingra14m/diff-gaussian-rasterization cd diff-gaussian-rasterization git checkout depth pip install .

I get the same error:

File "train.py", line 96, in training ema_loss_for_log = 0.4 * loss.item() + 0.6 * ema_loss_for_log RuntimeError: CUDA error: an illegal memory access was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

ricshaw avatar Sep 26 '23 17:09 ricshaw