neuralangelo icon indicating copy to clipboard operation
neuralangelo copied to clipboard

Mesh extract error

Open Iliceth opened this issue 1 year ago • 2 comments

Noob question time: I succesfully train many iterations but sometimes extracting the mesh starts well, gives out the number of faces, colors, etc. as a result and then, instead of writing the file to disk and going back to the prompt, gives this:

/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/utils/data/dataloader.py:561: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 6, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 0 (pid: 478) of binary: /home/user/miniconda3/envs/neuralangelo/bin/python
Traceback (most recent call last):
  File "/home/user/miniconda3/envs/neuralangelo/bin/torchrun", line 10, in <module>
    sys.exit(main())
  File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
    run(args)
  File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/user/miniconda3/envs/neuralangelo/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
====================================================
projects/neuralangelo/scripts/extract_mesh.py FAILED
----------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
----------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-11-01_20:30:50
  host      : MathZillaSSv3.
  rank      : 0 (local_rank: 0)
  exitcode  : -9 (pid: 478)
  error_file: <N/A>
  traceback : Signal 9 (SIGKILL) received by PID 478
====================================================

Three questions about this:

  • I assume the dataloader warning means I only have 6 cores on my cpu while it goes for 8, or is that not related?
  • Anyhow, while it did not yet result in problems, as far as I noticed, can someone please tell me if I can edit a certain file to set it to 6? I don't seem to find this.
  • Does someone know what the error is I run into? It seems like it happens when I choose settings too high, but there is still memory left, both VRAM and RAM and it doesn't seem to fill up, but ending like this.

Thanks!

Iliceth avatar Nov 01 '23 19:11 Iliceth

Hallo, ich habe da gleiche Problem. Nach etwas Suchen bekam ich die Info, dass es wohl an meinem RAM liegt. Aber ich habe auch noch nicht gefunden, wo ich das einstellen kann. Für eine Lösung wäre ich sehr dankbar.

Georg1986 avatar Nov 02 '23 09:11 Georg1986

Wenn ich bei mir die RESOLUTION=2048 auf RESOLUTION=1024 ändere, dann erhalte ich ein Mesh. Aber die Qualität ist demensprechend schlecht.

Georg1986 avatar Nov 04 '23 13:11 Georg1986