pod5-file-format icon indicating copy to clipboard operation
pod5-file-format copied to clipboard

Semaphore hissy fit at the end of subset run

Open mp15 opened this issue 1 year ago • 1 comments

Issue Description

Process completes successfully but multithreading throws a hissy fit about not being able to unlink its semaphores.

Logs

$ pod5 subset -t 50 PAO27011_pass_7b4991d0_ec3250cb.pod5 --missing-ok --summary sequencing_summary_PAO27011_7b4991d0_ec3250cb.txt --columns channel --output /tmp/tmp.Al3fg28Kg1 Subsetting: 99%|#########9| 2603/2623 [32:00<00:14, 1.36Files/s] Traceback (most recent call last): File "/software/python-3.10.1/lib/python3.10/multiprocessing/util.py", line 300, in _run_finalizers finalizer() File "/software/python-3.10.1/lib/python3.10/multiprocessing/util.py", line 224, in call res = self._callback(*self._args, **self._kwargs) File "/software/python-3.10.1/lib/python3.10/multiprocessing/synchronize.py", line 87, in _cleanup sem_unlink(name) FileNotFoundError: [Errno 2] No such file or directory

Specifications

  • Pod5 Version: 0.3.6
  • Python Version: Python 3.10.1
  • Platform: Ubuntu Bionic 18.04

mp15 avatar Feb 08 '24 10:02 mp15

Hi @mp15, Would you be able to add POD5_DEBUG=1 next time to hopefully capture in more detail what's going wrong here?

The number of "threads" (which are actually processes) -t 50 is quite high and the number of outputs (splitting by channel) is also quite high. This could be causing issues. Please consider lowering this value - the subsetting process is predominantly IO bound and there's diminishing returns with increasing threads.

HalfPhoton avatar Feb 08 '24 12:02 HalfPhoton