python-isal
python-isal copied to clipboard
BGZIP support
Checklist
- [ ] Pull request details were added to CHANGELOG.rst
- [ ] Documentation was updated (if needed)
@marcelm I noticed Sequali was bottlenecked by the gzip decompression. On the latest develop branch, analysing ONT files is faster than decompressing them. On BAM formats however this can be theoretically circumvented beacuse all BGZIP blocks are independent. FASTQ files are also often bgzip compressed. So I set out to write some code to alleviate this.
It works. Question is how to integrate this properly into xopen. My thoughts on this are
- Create a separate bgzip module here (
isal.bgzip). - Create a bgzip.open function. One threaded opening is moved off to the single-threaded opener in
isal.igzip_threadedas there is less overhead involved. - More threads are moved off to the _ThreadedBgzipReader class
- In
xopendetect the bgzip format by parsing the gzip header and use bgzip.open if that is the case. - I have no plans to support writing yet, but that could be useful if dnaio supports uBAM writing in the future. I will leave that job for when I need it though.
What are your thoughts on this?