permuto_sdf
permuto_sdf copied to clipboard
Getting segmentation_fault on training with viewer
Hello, I'm running a remote ec2 instance, with a remote desktop client called Nice DCV (a competitor to VNC for enterprise, free for ec2). 24GB VRAM and 64GB RAM.
I can train without a viewer with no problems. However, when I try to run it with a viewer, I get segmentation_fault
. The app window opens and nothing gets to load before it crashes.
I have tried both experimental and normal docker builds (I have only tried docker). I have tried checking out multiple versions of the repo (783c41f and e72ae5b), to see if the problem was recently introduced. Nothing has worked so far. The problem I get looks like this:
/workspace/permuto_sdf$ ./permuto_sdf_py/train_permuto_sdf.py --dataset dtu --scene dtu_scan24 --comp_name comp_3 --exp_info default
args.with_mask False
args.low_res False
checkpoint_path /workspace/permuto_sdf/checkpoints
with_viewer True
has_apex True
[ D96CB740]DataLoaderDTU.cxx:173 1| loaded nr of scenes 1 for mode train
[ D96CB740]DataLoaderDTU.cxx:432 1| reading poses and intrinsics for scene "dtu_scan24"
[ D96CB740]DataLoaderDTU.cxx:173 1| loaded nr of scenes 1 for mode test
[ D96CB740]DataLoaderDTU.cxx:432 1| reading poses and intrinsics for scene "dtu_scan24"
[ D96CB740] Mesh.cxx:3390 1| read obj with path /workspace/easy_pbr/data/sphere.obj
Segmentation fault (core dumped)
In contrast, when I train without a viewer, it looks like this:
/workspace/permuto_sdf$ ./permuto_sdf_py/train_permuto_sdf.py --dataset dtu --scene dtu_scan24 --comp_name comp_3 --exp_info default --no_viewer
args.with_mask False
args.low_res False
checkpoint_path /workspace/permuto_sdf/checkpoints
with_viewer False
has_apex True
[ 2A5FF740]DataLoaderDTU.cxx:173 1| loaded nr of scenes 1 for mode train
[ 2A5FF740]DataLoaderDTU.cxx:432 1| reading poses and intrinsics for scene "dtu_scan24"
[ 2A5FF740]DataLoaderDTU.cxx:173 1| loaded nr of scenes 1 for mode test
[ 2A5FF740]DataLoaderDTU.cxx:432 1| reading poses and intrinsics for scene "dtu_scan24"
phase.iter_nr 1000 loss 1.3530950546264648
phase.iter_nr 2000 loss 0.15609805285930634
phase.iter_nr 3000 loss 0.10311679542064667
...
How should I best troubleshoot this?