SplaTAM icon indicating copy to clipboard operation
SplaTAM copied to clipboard

Depth-Anything_v2 Depth Rendering

Open czeyveli1 opened this issue 1 year ago • 6 comments

Hello everyone; I am trying to denoising process on the TUM-RGBD depth dataset. When I get the new depth dataset and change instead of original depth dataset, I get this error. Mapping Time Step: 32: 100%|████████████████████████████████████████████| 30/30 [00:04<00:00, 6.30it/s] Tracking Time Step: 33: 100%|█████████████████████████████████████████| 200/200 [00:21<00:00, 9.39it/s] Tracking Time Step: 33: 100%|█████████████████████████████████████████| 200/200 [00:21<00:00, 9.41it/s] Selected Keyframes at Frame 33: [19, 4, 0, 9, 24, 14, 29, 33] 6%|███▌ | 33/592 [07:58<2:14:58, 14.49s/it] Traceback (most recent call last):████████████████████████▊ | 21/30 [00:03<00:01, 5.17it/s] File "/home/cz/Documents/SplaTAM/scripts/splatam.py", line 1014, in <module> rgbd_slam(experiment.config) File "/home/cz/Documents/SplaTAM/scripts/splatam.py", line 847, in rgbd_slam loss, variables, losses = get_loss(params, iter_data, variables, iter_time_idx, config['mapping']['loss_weights'], File "/home/cz/Documents/SplaTAM/scripts/splatam.py", line 253, in get_loss depth_sil, _, _, = Renderer(raster_settings=curr_data['cam'])(**depth_sil_rendervar) File "/home/cz/anaconda3/envs/splatam/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/cz/anaconda3/envs/splatam/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 186, in forward return rasterize_gaussians( File "/home/cz/anaconda3/envs/splatam/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 28, in rasterize_gaussians return _RasterizeGaussians.apply( File "/home/cz/anaconda3/envs/splatam/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 79, in forward num_rendered, color, radii, geomBuffer, binningBuffer, imgBuffer, depth = _C.rasterize_gaussians(*args) RuntimeError: CUDA out of memory. Tried to allocate 616.00 MiB (GPU 0; 7.75 GiB total capacity; 4.95 GiB already allocated; 441.75 MiB free; 5.57 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF I have used the median filter from OpenCV and skimage library to get rid of some noise (visual artifacts). For instance, the original image size of the one frame is 118 kB, and my output data was 11.2 kB. So, this is not a large dataset. I have also closed the COMPRESSION of png format using cv2.imwrite(output_file_path, median_using_skimage, [cv2.IMWRITE_PNG_COMPRESSION, 0]) command but I get the same error.

Could you help me with how can I solve this problem?

czeyveli1 avatar Jul 26 '24 13:07 czeyveli1

Hi, the Gaussian Splat could be blowing up due to a loss of tracking.

Since you mentioned that you are using custom depth images, if the depth images are not scale-consistent, SplaTAM will not work.

Nik-V9 avatar Aug 29 '24 16:08 Nik-V9

Hello @Nik-V9, thanks for your interest. I solved the problem, but I now have a new challenge.

I am trying to render a new depth dataset using the depth-anything_v2 algorithm. In the original TUM-RGBD dataset, there are 595 depth images, but they do not match any of the images in the original RGB dataset. My depth dataset contains 613 images (I generated a depth image from each RGB image), and I can render it, but the result is very poor. I would like to share my depth images and the results.

Original dataset;                                My Dataset Results;
Average PSNR: 21.00                              Average PSNR: 15.21
Average ATE RMSE: 3.75 cm                        Final Average ATE RMSE: 39.98 cm
Average Depth L1: 4.50 cm                        Average Depth L1: 18.46 cm
Average MS-SSIM: 0.816                           Average MS-SSIM: 0.628
Average LPIPS: 0.284                             Average LPIPS: 0.554
Mapping/PSNR 12.41468                            Mapping/PSNR 35.36498

Below is an example of the depth images I created using the depth_anything algorithm. (I normalized it between 0 and 10128.) 1305031452 791720

This is the result from the original TUM-Dataset: Screenshot from 2024-09-16 18-45-32

And here is the result from my depth dataset: Screenshot from 2024-09-16 18-46-23

Could you help me understand how to solve this problem?

czeyveli1 avatar Sep 16 '24 17:09 czeyveli1

if the depth images are not scale-consistent, SplaTAM will not work.

Hi, @Nik-V9! What do you mean by scale-consistent?

Santoi avatar Sep 30 '24 19:09 Santoi

Hi Santoi, monocular depth estimation is up to scale, i.e., in each instance of the model's prediction, there is no guarantee that 1 unit will be equal to a fixed distance scale. Hence, when you use multiple monocular depth maps together, you also need to optimize for a scale factor.

https://github.com/MichaelGrupp/evo/wiki/Metrics#alignment

You will probably get a better SplaTAM result using Metric3Dv2 with known intrinsic (since the prediction will always try to be in metric scale).

Nik-V9 avatar Sep 30 '24 22:09 Nik-V9

Hi @Nik-V9 , as you mentioned, I have also been trying to process with Metric3Dv2 for the past two days. I will let you know if the results are promising. Thank you so much for your interest.

czeyveli1 avatar Oct 01 '24 00:10 czeyveli1

Hi @czeyveli1, have you tried Metric3Dv2? Does it work better than SplaTAM? If the issue is being scale-variant, you may also try https://depthanyvideo.github.io/

emphos1 avatar Jan 21 '25 13:01 emphos1