neuralangelo
neuralangelo copied to clipboard
Mesh extract error
Noob question time: I succesfully train many iterations but sometimes extracting the mesh starts well, gives out the number of faces, colors, etc. as a result and then, instead of writing the file to disk and going back to the prompt, gives this:
/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 6, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 478) of binary: /home/user/miniconda3/envs/neuralangelo/bin/python
Traceback (most recent call last):
File "/home/user/miniconda3/envs/neuralangelo/bin/torchrun", line 10, in <module>
sys.exit(main())
File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
====================================================
projects/neuralangelo/scripts/extract_mesh.py FAILED
----------------------------------------------------
Failures:
<NO_OTHER_FAILURES>
----------------------------------------------------
Root Cause (first observed failure):
[0]:
time : 2023-11-01_20:30:50
host : MathZillaSSv3.
rank : 0 (local_rank: 0)
exitcode : -9 (pid: 478)
error_file: <N/A>
traceback : Signal 9 (SIGKILL) received by PID 478
====================================================
Three questions about this:
- I assume the dataloader warning means I only have 6 cores on my cpu while it goes for 8, or is that not related?
- Anyhow, while it did not yet result in problems, as far as I noticed, can someone please tell me if I can edit a certain file to set it to 6? I don't seem to find this.
- Does someone know what the error is I run into? It seems like it happens when I choose settings too high, but there is still memory left, both VRAM and RAM and it doesn't seem to fill up, but ending like this.
Thanks!
Hallo, ich habe da gleiche Problem. Nach etwas Suchen bekam ich die Info, dass es wohl an meinem RAM liegt. Aber ich habe auch noch nicht gefunden, wo ich das einstellen kann. Für eine Lösung wäre ich sehr dankbar.
Wenn ich bei mir die RESOLUTION=2048 auf RESOLUTION=1024 ändere, dann erhalte ich ein Mesh. Aber die Qualität ist demensprechend schlecht.