deepvariant icon indicating copy to clipboard operation
deepvariant copied to clipboard

BrokenPipeErro during postprocess_variants

Open LogCrab opened this issue 10 months ago • 7 comments

Have you checked the FAQ? https://github.com/google/deepvariant/blob/r1.6.1/docs/FAQ.md: Yes

Describe the issue: Hi developers of DeepVariant, I was using the latest DeepVaraint v1.6.1 for ONT data variant calling. Make_example and Call_variants works perfectly, but when it came to postprocess_variant things get out of control. In detail, it reported as below:

***** Running the command:*****
time /opt/deepvariant/bin/postprocess_variants --ref "/input/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta" --infile "/inter/tmp/call_variants_output.tfrecord.gz" --outfile "/input/{VCF.gz}" --cpus "120"

2024-04-08 06:27:55.589078: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libcublas.so.12: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
2024-04-08 06:27:55.589111: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
I0408 06:27:57.480687 140282942986048 postprocess_variants.py:1211] Using sample name from call_variants output. Sample name: default
I0408 06:55:56.836046 140282942986048 postprocess_variants.py:1313] CVO sorting took 27.989152932167052 minutes
I0408 06:55:56.837136 140282942986048 postprocess_variants.py:1316] Transforming call_variants_output to variants.
I0408 06:55:56.837199 140282942986048 postprocess_variants.py:1318] Using 120 CPUs for parallelization of variant transformation.
I0408 07:06:00.821415 140282942986048 postprocess_variants.py:1211] Using sample name from call_variants output. Sample name: default
I0408 07:29:46.200004 140282942986048 postprocess_variants.py:1365] Writing variants to VCF.
I0408 07:29:46.201339 140282942986048 postprocess_variants.py:973] Writing output to VCF file: /input/R9G4.vcf.gz
I0408 07:29:46.877771 140282942986048 genomics_writer.py:183] Writing /input/R9G4.vcf.gz with NativeVcfWriter
I0408 07:30:12.688740 140282942986048 postprocess_variants.py:987] 1 variants written.

real    70m57.596s
user    45m18.578s
sys     30m4.764s
Process ForkPoolWorker-83:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 136, in worker
    put((job, i, (False, wrapped)))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes
    self._send(header)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe
Process ForkPoolWorker-42:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 131, in worker
    put((job, i, result))
  File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put
    self._writer.send_bytes(obj)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes
    self._send_bytes(m[offset:offset + size])
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 405, in _send_bytes
    self._send(buf)
  File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send
    n = write(self._handle, buf)
BrokenPipeError: [Errno 32] Broken pipe

There are many BrokenPipeErorr below, I just snapshot part of it. Do you have any idea why this error happens?

Setup

  • Operating system: Ubuntu 22.04
  • DeepVariant version: v1.6.1
  • Installation method : singularity
  • Type of data: ONT sequencing data

LogCrab avatar Apr 08 '24 02:04 LogCrab

I solve this by shuting down multiprocessing (--cpus "0") using the following command

postprocess_variants --cpus "0"  --ref "/input/GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta" --infile "/tmp/call_variants_output.tfrecord.gz" --outfile "/input/{OUTVCF.gz}" 

But I still not sure what cuase Broken pipe lipe. I also tried less core like --cpu "12" still get BrokenPipeError: [Errno 32] Broken pipe.

LogCrab avatar Apr 08 '24 05:04 LogCrab

Is it possible for you to share /tmp/call_variants_output.tfrecord.gz (you can send it to [email protected])? I would be very interested in trying to reproduce the issue myself. I have a theory as to what may be the root cause.

lucasbrambrink avatar Apr 10 '24 20:04 lucasbrambrink

@lucasbrambrink Thank you for you response, I do kept those files but there are 16 of them, listed as below

call_variants_output-00000-of-00016.tfrecord.gz
call_variants_output-00001-of-00016.tfrecord.gz
call_variants_output-00002-of-00016.tfrecord.gz
call_variants_output-00003-of-00016.tfrecord.gz
call_variants_output-00004-of-00016.tfrecord.gz
call_variants_output-00005-of-00016.tfrecord.gz
call_variants_output-00006-of-00016.tfrecord.gz
call_variants_output-00007-of-00016.tfrecord.gz
call_variants_output-00008-of-00016.tfrecord.gz
call_variants_output-00009-of-00016.tfrecord.gz
call_variants_output-00010-of-00016.tfrecord.gz
call_variants_output-00011-of-00016.tfrecord.gz
call_variants_output-00012-of-00016.tfrecord.gz
call_variants_output-00013-of-00016.tfrecord.gz
call_variants_output-00014-of-00016.tfrecord.gz
call_variants_output-00015-of-00016.tfrecord.gz

each is about 200 MB due to the high coverage of the sequencing, do you want all of them?

LogCrab avatar Apr 11 '24 04:04 LogCrab

Yes please, that would be great!

lucasbrambrink avatar Apr 11 '24 15:04 lucasbrambrink

@lucasbrambrink The file you requested is sent to your email just now.

LogCrab avatar Apr 12 '24 02:04 LogCrab

@LogCrab Thank you for providing the files. We've tried to reproduce this in various ways without success. Could you provide the specs of the machine you ran this on? Thank you for your patience!

lucasbrambrink avatar Apr 25 '24 16:04 lucasbrambrink

@lucasbrambrink Sorry the the dealy. My machine is running Ubuntu 22.04. The cat /proc/version output is Linux version 6.2.0-35-generic (buildd@bos03-amd64-016) (x86_64-linux-gnu-gcc-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #35~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 6 10:23:26 UTC 2 In terms of hardware are dual Intel Platium 8352Y, dual RTX 3090 and 512 G of RAM and full HDD array. Hope this can help.

LogCrab avatar Apr 30 '24 15:04 LogCrab

So we were unable to reproduce this specific error. Regardless, we are overhauling how multiprocessing is used in postprocess_variants with our next release, which will very likely avoid this type of error.

I am closing this issue for now. If someone is experiencing this issue and would like an experimental docker container to run, please comment on this issue and we will provide one!

lucasbrambrink avatar May 15 '24 21:05 lucasbrambrink

Hi, I am experiencing similar issue - VM, 32 threads, 64GB RAM. Could you provide the experimental container? Btw, thank you for outstanding work while making this software available for us. This tool has great value is reliable and important for us.

Regards, Tomasz Stokowy, Leader Scientific Computing, University of Bergen, Norway

Running via docker 1.6.1, earlier steps work smoothly.

cat /proc/version Linux version 6.1.0-22-amd64 ([email protected]) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.94-1 (2024-06-21)

Error log:

***** Running the command:***** time /opt/deepvariant/bin/postprocess_variants --ref "/Reference/core_ref_GRCh38_hla_decoy_ebv/genome.fa" --infile "/Output/call_variants_output.tfrecord.gz" --outfile "/Output/CoriellIndex.vcf" --cpus "32" --gvcf_outfile "/Output/CoriellIndex.gvcf" --nonvariant_site_tfrecord_path "/Output/[email protected]"

I0823 15:16:56.752997 139658307389248 postprocess_variants.py:1211] Using sample name from call_variants output. Sample name: CoriellIndex 2024-08-23 15:16:56.766309: I deepvariant/postprocess_variants.cc:94] Read from: /Output/call_variants_output-00000-of-00001.tfrecord.gz 2024-08-23 15:18:01.806248: I deepvariant/postprocess_variants.cc:109] Total #entries in single_site_calls = 10880665 I0823 15:20:45.074391 139658307389248 postprocess_variants.py:1313] CVO sorting took 3.805263650417328 minutes I0823 15:20:45.077561 139658307389248 postprocess_variants.py:1316] Transforming call_variants_output to variants. I0823 15:20:45.077694 139658307389248 postprocess_variants.py:1318] Using 32 CPUs for parallelization of variant transformation. I0823 15:20:51.014987 139658307389248 postprocess_variants.py:1211] Using sample name from call_variants output. Sample name: CoriellIndex

real 8m32.455s user 7m11.835s sys 1m25.577s Process ForkPoolWorker-2: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put self._writer.send_bytes(obj) File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes self._send(header) File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/lib/python3.8/multiprocessing/pool.py", line 136, in worker put((job, i, (False, wrapped))) File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put self._writer.send_bytes(obj) File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes self._send(header) File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe Process ForkPoolWorker-28: Traceback (most recent call last): File "/usr/lib/python3.8/multiprocessing/pool.py", line 131, in worker put((job, i, result)) File "/usr/lib/python3.8/multiprocessing/queues.py", line 368, in put self._writer.send_bytes(obj) File "/usr/lib/python3.8/multiprocessing/connection.py", line 200, in send_bytes self._send_bytes(m[offset:offset + size]) File "/usr/lib/python3.8/multiprocessing/connection.py", line 404, in _send_bytes self._send(header) File "/usr/lib/python3.8/multiprocessing/connection.py", line 368, in _send n = write(self._handle, buf) BrokenPipeError: [Errno 32] Broken pipe

(similar records from other workers repeating here ...)

tstokowy avatar Aug 24 '24 07:08 tstokowy