sdfstudio icon indicating copy to clipboard operation
sdfstudio copied to clipboard

assertion error training on tanks-and-temple/scan3

Open Legend94rz opened this issue 2 years ago • 3 comments

Describe the bug Here is the traceback when training on the sdfstudio dataset tanks-and-temple/scan3:

122440 (12.24%)     116.110 ms           17.64 K              15.88 K
122450 (12.24%)     116.215 ms           17.62 K
122460 (12.25%)     116.229 ms           17.62 K
122470 (12.25%)     116.068 ms           17.65 K
122480 (12.25%)     116.300 ms           17.61 K              16.03 K
122490 (12.25%)     118.293 ms           17.35 K
122500 (12.25%)     118.894 ms           17.28 K
122510 (12.25%)     117.497 ms           17.45 K
122520 (12.25%)     116.755 ms           17.54 K              16.18 K
122530 (12.25%)     116.402 ms           17.60 K
----------------------------------------------------------------------------------------------------
Viewer at: https://viewer.nerf.studio/versions/23-03-9-0/?websocket_url=ws://localhost:7007
Printing profiling stats, from longest to shortest duration in seconds
ViewerState._render_image_in_viewer: 0.1586
Trainer.train_iteration: 0.0529
VanillaPipeline.get_train_loss_dict: 0.0398
Traceback (most recent call last):
  File "/opt/conda/envs/sdf/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/home/renzhen/userdata/repo/sdfstudio/scripts/train.py", line 250, in entrypoint
    main(
  File "/home/renzhen/userdata/repo/sdfstudio/scripts/train.py", line 236, in main
    launch(
  File "/home/renzhen/userdata/repo/sdfstudio/scripts/train.py", line 175, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/home/renzhen/userdata/repo/sdfstudio/scripts/train.py", line 90, in train_loop
    trainer.train()
  File "/home/renzhen/userdata/repo/sdfstudio/nerfstudio/engine/trainer.py", line 151, in train
    loss, loss_dict, metrics_dict = self.train_iteration(step)
  File "/home/renzhen/userdata/repo/sdfstudio/nerfstudio/utils/profiler.py", line 43, in wrapper
    ret = func(*args, **kwargs)
  File "/home/renzhen/userdata/repo/sdfstudio/nerfstudio/engine/trainer.py", line 319, in train_iteration
    _, loss_dict, metrics_dict = self.pipeline.get_train_loss_dict(step=step)
  File "/home/renzhen/userdata/repo/sdfstudio/nerfstudio/utils/profiler.py", line 43, in wrapper
    ret = func(*args, **kwargs)
  File "/home/renzhen/userdata/repo/sdfstudio/nerfstudio/pipelines/base_pipeline.py", line 273, in get_train_loss_dict
    loss_dict = self.model.get_loss_dict(model_outputs, batch, metrics_dict)
  File "/home/renzhen/userdata/repo/sdfstudio/nerfstudio/models/neus_facto.py", line 308, in get_loss_dict
    loss_dict["interlevel_loss"] = self.config.interlevel_loss_mult * interlevel_loss_zip(
  File "/home/renzhen/userdata/repo/sdfstudio/nerfstudio/model_components/losses.py", line 144, in interlevel_loss_zip
    assert (y_r >= 0.0).all()
AssertionError

To Reproduce

  1. download data via ns-download-data
  2. ns-train neus-facto-angelo --data tanks-and-temple/scan3/ --pipeline.model.sdf-field.inside-outside True sdfstudio-data --include-mono-prior True

Expected behavior Training model without error.

Additional context

torch                     1.12.1+cu113             pypi_0    pypi
tinycudann                1.7                      pypi_0    pypi
nerfstudio                0.1.12

Legend94rz avatar Dec 07 '23 01:12 Legend94rz

Hi, sorry for the late reply. the interlevel_loss_zip is not numerically stable. You can change it to interlevel_loss.

niujinshuchong avatar Dec 13 '23 11:12 niujinshuchong

@niujinshuchong @Legend94rz

Hi, sorry for the late reply. the interlevel_loss_zip is not numerically stable. You can change it to interlevel_loss.

Can i know how to change it ,is it a parameter ??

paidiakileswar avatar Aug 01 '24 07:08 paidiakileswar

Should I change here

        if self.training:
            loss_dict["interlevel_loss"] = self.config.interlevel_loss_mult * interlevel_loss_zip(
                outputs["weights_list"], outputs["ray_samples_list"]
            )
            

To

       if self.training:
            loss_dict["interlevel_loss"] = self.config.interlevel_loss_mult * interlevel_loss(
                outputs["weights_list"], outputs["ray_samples_list"]
            )

paidiakileswar avatar Aug 01 '24 07:08 paidiakileswar