TALON icon indicating copy to clipboard operation
TALON copied to clipboard

Any chance of getting bam support?

Open oneillkza opened this issue 3 years ago • 4 comments

Would there be any chance of getting Talon to accept bams as input? We have bams >100GB. We basically never store these as sams, since they'd be 0.5TB each. But to run Talon, I have to uncompress back to sam.

This probably also slows down the execution of Talon, as I'd imagine it'd be much faster to read in from bam.

Would it be possible to get bam file support? Or alternately, reading from stdin so that samtools/sambamba view output could be piped to it directly?

oneillkza avatar May 08 '21 02:05 oneillkza

Note: this seems extra silly, since it looks like the first thing TALON does is turn the sam back into a bam.

But when I tried to pass it a bam in the config file, it threw and error saying it needed sam input.

oneillkza avatar May 12 '21 18:05 oneillkza

Thank you for the suggestion - I agree that this would be a nice quality of life improvement for TALON. We are planning a TALON upgrade in the coming months and will try to roll this into it!

fairliereese avatar May 12 '21 19:05 fairliereese

Thanks -- it would be a huge quality of life improvement. I'm running TALON now on a whole PromethION flowcell's worth of RNA-seq (>100M reads), and it needs about 700GB of extra storage to turn the BAM into SAM and then have TALON turn it back into BAM again. It's also taking about 18 hours to do all that, since it seems to be single-threaded.

oneillkza avatar May 15 '21 01:05 oneillkza

If you're interested in trying it out, I added BAM support to the development branch. I think it would also make sense to add the multithreading option to the SAM to BAM conversion in line with as many threads as are already given to TALON, so I can do that as well.

fairliereese avatar May 15 '21 03:05 fairliereese