bamnostic
bamnostic copied to clipboard
BAM output support
The README says BAMnostic is a pure Python, OS-agnositic Binary Alignment Map (BAM) file parser and random access tool.
Are you intending to support BAM output, or is this only possible though a workaround like outputting SAM format and piping this though a command line tool like samtools?
This isn't that complicated, see e.g. my implementation at https://github.com/peterjc/biopython/blob/SamBam2015/Bio/Sequencing/SamBam/init.py#L1714
Make it more formal here that I am currently working on supporting BAM output with bamnostic. I will keep this issue open until that PR is complete
@peterjc 3051e0b should address this issue. It is heavily based on where you pointed me to. However, I made some changes regarding how reads that border full blocks are handled: If a read is not the first read of the block and will not fit in the block, it starts a new one. Otherwise it will follow the normal flow you had set up.
I also added support of directly copying a BAM file's header if it has been opened by bamnostic.
Additionally, instead of having to reconstruct the byte seq of the read, each AlignedSegment
has the _raw_stream
attribute that is a direct copy of the whole record. This can quickly be written to a file handler without compression.
I don't yet set how you will use ._raw_stream
but it does make sense to try as an optimisation when writing alignment data back to disk without modifications.
Ideally, a new method for AlignedSegment
will be written called to_bam(<bgzf.BamWriter>)
which would use _raw_stream
to write the read to the file. There are still some API stuff to hash out, but I just wanted to point out that it is almost there.
Additionally, it have added CSI support to bamnostic (in case you were wondering).