bamnostic icon indicating copy to clipboard operation
bamnostic copied to clipboard

BAM output support

Open peterjc opened this issue 6 years ago • 5 comments

The README says BAMnostic is a pure Python, OS-agnositic Binary Alignment Map (BAM) file parser and random access tool.

Are you intending to support BAM output, or is this only possible though a workaround like outputting SAM format and piping this though a command line tool like samtools?

peterjc avatar Jul 17 '18 01:07 peterjc

This isn't that complicated, see e.g. my implementation at https://github.com/peterjc/biopython/blob/SamBam2015/Bio/Sequencing/SamBam/init.py#L1714

peterjc avatar Jul 17 '18 01:07 peterjc

Make it more formal here that I am currently working on supporting BAM output with bamnostic. I will keep this issue open until that PR is complete

betteridiot avatar Jul 20 '18 14:07 betteridiot

@peterjc 3051e0b should address this issue. It is heavily based on where you pointed me to. However, I made some changes regarding how reads that border full blocks are handled: If a read is not the first read of the block and will not fit in the block, it starts a new one. Otherwise it will follow the normal flow you had set up.

I also added support of directly copying a BAM file's header if it has been opened by bamnostic.

Additionally, instead of having to reconstruct the byte seq of the read, each AlignedSegment has the _raw_stream attribute that is a direct copy of the whole record. This can quickly be written to a file handler without compression.

betteridiot avatar Sep 20 '18 20:09 betteridiot

I don't yet set how you will use ._raw_stream but it does make sense to try as an optimisation when writing alignment data back to disk without modifications.

peterjc avatar Sep 21 '18 10:09 peterjc

Ideally, a new method for AlignedSegment will be written called to_bam(<bgzf.BamWriter>) which would use _raw_stream to write the read to the file. There are still some API stuff to hash out, but I just wanted to point out that it is almost there.

Additionally, it have added CSI support to bamnostic (in case you were wondering).

betteridiot avatar Sep 21 '18 14:09 betteridiot