Is there any suggestions to train large scene_scale dataset with MCMC?

Open a11enL opened this issue 11 months ago • 1 comments

gsplat 1.4.0

python3 examples/simple_trainer.py mcmc --use_bilateral_grid --data_factor 1 --data_dir data/test123/ --result_dir exports/test123/

part of logs

... [Parser] 120 images, taken by 120 cameras. Scene scale: 2009.6237303032315 Model initialized. Number of GS: 159431 ... loss=0.475| sh degree=0| : 100%|██▋| 299/30000 [00:25< ...... ...... loss=0.185| sh degree=3| : 25%|██████████████████████▌ | 7600/30000 [13:15<39:05, 9.55it/s] Traceback (most recent call last): File "/home/ubuntu/gsplat-1.4.0/examples/simple_trainer.py", line 1120, in cli(main, cfg, verbose=True) File "/home/ubuntu/gsplat-1.4.0/gsplat/distributed.py", line 360, in cli return _distributed_worker(0, 1, fn=fn, args=args) File "/home/ubuntu/gsplat-1.4.0/gsplat/distributed.py", line 295, in _distributed_worker fn(local_rank, world_rank, world_size, args) File "/home/ubuntu/gsplat-1.4.0/examples/simple_trainer.py", line 1065, in main runner.train() File "/home/ubuntu/gsplat-1.4.0/examples/simple_trainer.py", line 820, in train self.cfg.strategy.step_post_backward( File "/home/ubuntu/gsplat-1.4.0/gsplat/strategy/mcmc.py", line 128, in step_post_backward n_relocated_gs = self._relocate_gs(params, optimizers, binoms) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/ubuntu/gsplat-1.4.0/gsplat/strategy/mcmc.py", line 158, in _relocate_gs relocate( File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/ubuntu/gsplat-1.4.0/gsplat/strategy/ops.py", line 278, in relocate sampled_idxs = _multinomial_sample(probs, n, replacement=True) File "/home/ubuntu/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) File "/home/ubuntu/gsplat-1.4.0/gsplat/strategy/ops.py", line 31, in _multinomial_sample assert not num_elements == 0, ('_multinomial_sample weights 0') AssertionError: _multinomial_sample weights 0

It seems gsplat will crash or generate a nothing but noise ply file if change opacity_reg to 0.001, when dataset scene_scale is larger than 2000 or 10000. all of cases is running with high traning loss. I have three datasets from AI tool with this situation here, both these datasets work well with colmap without any errors or warnings. and I nerver encounter such situation with traditional colmap dataset. thanks.

Jan 19 '25 12:01 a11enL

Current the problem is that loss doesn't converge, it stopped on 0.1 or 0.2. I have tried noise-lr, opacity_reg, scale_reg and filter theose zero-alive GS cases etc.

This is the dataset with above situation

120 images 120 cameras 159431 points Source: colmap image_undistorter

download, 183M, Google Drive https://drive.google.com/file/d/1pn8b74AAGobQI-Bxv6arLA-B6-l8IIWQ/view?usp=sharing

Jan 20 '25 09:01 a11enL