biobambam2 icon indicating copy to clipboard operation
biobambam2 copied to clipboard

"Too many open files"

Open map2085 opened this issue 7 years ago • 4 comments

I am working with very large data. Gzip FASTQ size = 250 GB . I split the FASTQ file into ~1,200 smaller FASTQ files. I aligned the 1,200 FASTQ files with BWA, standard parameters.

Now I am trying to merge the 1,200 small BAM files (~350 Mb each) with biobambam2.

Immediately upon calling biobambam2 bammerge, it fails with error message: "Too many open files"

map2085 avatar Sep 08 '17 22:09 map2085

Try using bamcat instead. This will not open all input files at the same time. If you want the output to be sorted then use

bamcat level=0 in1.bam in2.bam ... | bamsort

gt1 avatar Sep 08 '17 22:09 gt1

I understand. This workaround would be very inefficient though, since it would have to re-sort all of the files after cat, even though the files were pre-sorted, right?

map2085 avatar Sep 08 '17 22:09 map2085

You can try whether a multiple stage merge is faster, i.e. use bammerge to merge subsets, then merge the pre merged files. bammerge currently has no support for doing multiple stage merges directly.

gt1 avatar Sep 08 '17 22:09 gt1

yeah, I have implemented the multiple intermediate stage merge workaround. It's not difficult, but cumbersome and a nuisance. I just thought to post the notification here to alert everyone.

biobambam2 works great though!

map2085 avatar Sep 08 '17 23:09 map2085