samtools icon indicating copy to clipboard operation
samtools copied to clipboard

samtools view command error

Open skatragadda-nygc opened this issue 2 years ago • 12 comments

Are you using the latest version of samtools and HTSlib? If not, please specify.

(run samtools --version) samtools 1.15 Using htslib 1.15

Please describe your environment.

  • OS (run uname -sr on Linux/Mac OS or wmic os get Caption, Version on Windows)
  • machine architecture (run uname -m on Linux/Mac OS or wmic os get OSArchitecture on Windows)
  • compiler (run gcc --version or clang --version) Linux 5.10.0-12-cloud-amd64 gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0

Please specify the steps taken to generate the issue, the command you are running and the relevant output.

samtools view -C -T https://storage.googleapis.com/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta -t https://storage.googleapis.com/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta.fai -o somatic-test.cram ourfile.bam --verbosity=8

here is the error message [I::cram_next_container] Flush container 1/10059..-1

  • Connection #0 to host storage.googleapis.com left intact [E::easy_errno] Libcurl reported error 16 (Error in the HTTP2 framing layer) [E::bgzf_read_block] Failed to read uncompressed data at offset 258539632: Input/output error [E::bgzf_read] Read block operation failed with error 4 after 7093421 of 244615464 bytes bgzf_read() on reference file: Input/output error [E::cram_encode_container] Failed to load reference #1 samtools view: writing to "somatic-test.cram" failed: Input/output error [I::cram_encode_compression_header] Wrote compression block header in 6 bytes [E::easy_errno] Libcurl reported error 16 (Error in the HTTP2 framing layer)

skatragadda-nygc avatar Mar 24 '22 09:03 skatragadda-nygc

It looks like a network error. Is your error reproducible?

Does your command work if you download the reference and index and run it locally?

I ran your command on some of our human data and it worked normally but since I don't know what is in your ourfile.bam I am not sure it is a good test.

whitwham avatar Mar 24 '22 11:03 whitwham

Initially, it works for some time. After 5 or 10 minutes it stops working with the above error

Currently running inside a docker container on a GCP VM. Testing the command with local files works fine @whitwham could you try to reproduce it on your end inside a container? unfortunately, I cannot share the bam file.

skatragadda-nygc avatar Mar 24 '22 12:03 skatragadda-nygc

Testing the command with local files works fine @whitwham could you try to reproduce it on your end inside a container?

Not until tomorrow. My current working machine does not allow containers. I would have to set something up.

whitwham avatar Mar 24 '22 13:03 whitwham

sounds good. Could you also verify if we can work with gs:// URLs instead of local files or HTTPS URLs? somatic RNA conversion works fine with HTTPS URLs but somatic DNA conversion fails with above errors

skatragadda-nygc avatar Mar 24 '22 14:03 skatragadda-nygc

@skatragadda-nygc I have run it in a docker container using a 1000 Genomes file and it worked for me. The running time was about 30 minutes on both coordinate and name sorted data. HTTPS and GS URLs both worked.

It may be something to do with the GCP VM but I do not have enough in depth knowledge to help there.

whitwham avatar Mar 25 '22 14:03 whitwham

@whitwham I'm @skatragadda-nygc colleague. We did more debugging. Turns out that our end user have been running samtools with the default 1 thread. This will always trigger the libcurl error at exactly this position. [I::cram_next_container] Flush container 1/10059..-1

If we increase the threads to 2 or more, the run completes successfully. I'm wondering if you're running samtools with more threads? We tested this without docker too and in a few different OS and none GCP. Ubuntu 22.04, 18.04, Centos 7.9 and even on a Mac. Originally we thought perhaps it is due to different versions of libcurl4 so we have tried new and older version. All of them would trigger the error when running samtools with 1 thread. As soon as we increased the thread, the error goes away. Can you confirm if you're also testing with 1 thread?

xk42 avatar Mar 25 '22 14:03 xk42

@xk42 I'm running with only one thread. Pretty much the same as in the original command given by @skatragadda-nygc but with my own input file and no verbosity option.

I took a generic ubuntu image off docker and it is using an older Linux kernel. I have a VM with a newer kernel ready and I will see what running on that does.

whitwham avatar Mar 25 '22 15:03 whitwham

We tested on two different bam files. It does look like the size might be triggering this. One of our bam is 25G and we don't have any issues with 1 or more threads. The other one is 150G and this is the one that is triggering the error when running with 1 thread.

xk42 avatar Mar 25 '22 15:03 xk42

Okay, the test with the newer kernel worked, though only on a 2.5G file.

I'm running a test on a 340G file, the biggest I have easily available, it may take some time.

whitwham avatar Mar 25 '22 16:03 whitwham

Well this is interesting.

time samtools view -C -T https://storage.googleapis.com/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta -t https://storage.googleapis.com/genomics-public-data/references/hg38/v0/Homo_sapiens_assembly38.fasta.fai -o somatic-test.cram hg_align.sorted.bam [E::bgzf_read_block] Failed to read uncompressed data at offset 281411696: Broken pipe [E::bgzf_read] Read block operation failed with error 4 after 29965485 of 244615464 bytes bgzf_read() on reference file: Broken pipe [E::cram_encode_container] Failed to load reference #1 samtools view: writing to "somatic-test.cram" failed: Broken pipe real 14m25.176s user 12m16.194s sys 0m20.698s

This was with the 340G bam file. It failed after 14 minutes, which is less time than the successful runs took. This, as you say, implies size is a factor. It did manage to write out a 2.7G cram file.

I'll need to discuss this with my colleagues after the weekend.

whitwham avatar Mar 25 '22 17:03 whitwham

I wonder if it's timing-related. Possibly Google is timing out the https connection between samtools reading the first and second chromosome references?

daviesrob avatar Mar 25 '22 17:03 daviesrob

I'm not at all familier with GCS, but it occurs maybe this is an issue of retaining a file handle open for too long without interim usage. There is a significant difference with how CRAM works when threaded and unthreaded, which hints at this.

When unthreaded, it opens the appropriate reference file and then periodically reads from it (bgzf_read as successive CRAM containers march their way along that chromosome). I'm unsure of how the cloud I/O is done in htslib, but is it possible the bgzf_open is getting some session cookie (or equivalent concept) which in then used for all subsequent reads. If the time is too long between open and read, then it could fail.

Now this is where threading dramatically changes things. CRAM threaded encoding doesn't load the reference in bits as it goes as it's hard to work out when a reference section has gone out of scope and can be discarded. Instead it loads an entire chromosome at a time and holds it in memory. This makes the open/read calls (assuming separate chromosomes are separate connections) close together in time and avoids time-out issues.

Obviously using 2 threads (or maybe even explicitly asking for -@1, but I'm unsure on that) works around this, but if you wish to stick to 1 thread only then also using wget or similar on the reference file to cache it locally first would also solve the problem.

I'm unsure of how we solve this properly in htslib, except perhaps with some error catch so if a bgzf_read fails on a network connection then we attempt a reopen & seek first before failing.

jkbonfield avatar Mar 28 '22 08:03 jkbonfield