noodles
noodles copied to clipboard
[feature] Pileup Engine
This is a first draft of a pileup engine using noodles
for the underlying IO.
Is this something you would be interested in having in the noodles
family? If you want to keep noodles
more on the IO side I totally understand and I'll move this into it's own library. I just wanted to get your thoughts before I start adding examples / docs / tests. (This is not the final PR, just a draft, more work would be done to get this code on par with the rest of noodles.)
The impetus for me is basically to skip the middleman in htslib and have the pileup engine supply more information up front and not have to re-analyze the stack of reads for this tool: https://github.com/sstadick/perbase. The implementation below is very much based on sambamba
's impl, which does provide more info up front.
Any and all feedback is welcome :+1:
BTW, noodles is a fantastic set of libraries, thank you for doing this in the open!
Nice, this is a great initiative!
Let's include something like this. Even though, as you mentioned, it's more algorithmic than I/O, pileup is a fairly common operation and is likely to be expected from an alignments reader.
Awesome! I'll get it into PR-worthy shape then, and smooth out the rough edges and finish off all the TODO's at the top of pileup.rs
.
Once that's in place I'll convert from a draft and we can iterate from there :+1:
I'm still going to come back to this and have not forgotten about it, life has just not conspired to make time for piling up reads lately 👍
I'd be interested in using something like this if it were in noodles.
I added a simple pileup iterator in acd49bb625bcc23194590006f28a76064923001f that currently just calculates column depths. It piles records over an adaptive window on the reference sequence and is optimized for low latency, i.e., it emits columns immediately after they are guaranteed to no longer be affected by future records. This implies that it only works with coordinate-sorted data. It doesn't include all the counts as in this patch, but it can be iterated and built upon.
Thanks to @sstadick for the initial implementation and inspiration.
Thanks @zaeleus . A really clean implementation. I'll look into expanding this in the future.