biobambam2
biobambam2 copied to clipboard
"Too many open files"
I am working with very large data. Gzip FASTQ size = 250 GB . I split the FASTQ file into ~1,200 smaller FASTQ files. I aligned the 1,200 FASTQ files with BWA, standard parameters.
Now I am trying to merge the 1,200 small BAM files (~350 Mb each) with biobambam2.
Immediately upon calling biobambam2 bammerge
, it fails with error message: "Too many open files"
Try using bamcat instead. This will not open all input files at the same time. If you want the output to be sorted then use
bamcat level=0 in1.bam in2.bam ... | bamsort
I understand.
This workaround would be very inefficient though, since it would have to re-sort all of the files after cat
, even though the files were pre-sorted, right?
You can try whether a multiple stage merge is faster, i.e. use bammerge to merge subsets, then merge the pre merged files. bammerge currently has no support for doing multiple stage merges directly.
yeah, I have implemented the multiple intermediate stage merge workaround. It's not difficult, but cumbersome and a nuisance. I just thought to post the notification here to alert everyone.
biobambam2
works great though!