BAMscale
BAMscale copied to clipboard
Feature request – add support for CSI (`*.csi`) BAM indices
Hello,
Thank you for developing BAMscale, it has become my go-to tool for generating bigwigs. While processing a large wheat ChIP-seq dataset I ran into a limitation that I hope could be addressed (or perhaps you already have a workaround):
Summary
When a BAM is indexed with CSI (needed for large chromosomes or many contigs), BAMscale fails with cannot find *.bai.
Steps to reproduce
samtools index -c sample.bam # creates sample.bam.csi
BAMscale scale --bam sample.bam --binsize 10
# ERROR: cannot find sample.bam.bai
Expected behaviour
- Automatically load
sample.bam.csi, or - Allow specifying the index path (e.g.
--index sample.bam.csi).
Feasibility notes (from HTSlib docs)
-
HTSlib loads BAI or CSI transparently via
sam_index_load()after opening withhts_open/sam_open. See thesam.hAPI docs. -
Explicit index path is supported in HTSlib ≥1.10 using the
##idx##syntax (e.g.sample.bam##idx##/path/to/sample.bam.csi) or viahts_idx_load2(fn, fnidx). See the 1.10 release notes andhts.h. - Background: BAI indexes are limited to chromosomes ≤512 Mbp, hence CSI for large genomes.
Environment
- BAMscale v0.0.9
- samtools 1.21
Reference
- CSI spec: https://github.com/samtools/hts-specs/blob/master/CSIv1.pdf
Thanks for considering this.