cDNA sequences / genomecov: ignore local deletions, split on large deletions / exon skips
this comment states "we always want to split blocks when a D CIGAR op is found". I have a situation where I don't want that to happen.
I'm mapping nanopore cDNA reads to the genome, and would like to preserve large exon skips (represented by minimap2 split reads in BAM files as 'N'), but ignore local deletions (represented as 'D'), because they're not particularly interesting in nanopore reads due to the abundance of deletions as part of base calling error.
Could an option please be added to genomecov to do this:
bedtools genomecov -strand '-' -bga -split -ignoreD -ibam input.bam > output.bg
My guess is that this fix would involve something like the attached patch. ignoreLocalDeletions.patch.txt
Here's an example SAM file [attached] mapped_BC01.sam.txt which should appear in the bedgraph file as a single interval. My attempted modification using the above patch produces incorrectly split output with this read:
$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01.sam
chrX 0 7384244 0
chrX 7384244 7384283 1
chrX 7384283 7384285 0
chrX 7384285 7384288 1
chrX 7384288 7384289 0
chrX 7384289 7384301 1
chrX 7384301 7384302 0
chrX 7384302 7384413 1
chrX 7384413 171031299 0
Expected output:
$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01.sa
chrX 0 7384244 0
chrX 7384244 7384413 1
chrX 7384413 171031299 0
Here's another example SAM file [attached] mapped_BC01_exon.sam.txt which should appear as two intervals. This is also split at deletion points:
$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01_exon.sam
chr11 0 62551499 0
chr11 62551499 62551595 1
chr11 62551595 62552139 0
chr11 62552139 62552148 1
chr11 62552148 62552149 0
chr11 62552149 62552152 1
chr11 62552152 62552153 0
chr11 62552153 62552154 1
chr11 62552154 62552156 0
chr11 62552156 62552252 1
chr11 62552252 62552253 0
chr11 62552253 62552417 1
chr11 62552417 62552418 0
chr11 62552418 62552432 1
chr11 62552432 62552433 0
chr11 62552433 62552536 1
chr11 62552536 62552537 0
chr11 62552537 62552613 1
chr11 62552613 62552614 0
chr11 62552614 62552659 1
chr11 62552659 62552660 0
chr11 62552660 62552708 1
chr11 62552708 62552709 0
chr11 62552709 62552903 1
chr11 62552903 62552905 0
chr11 62552905 62552941 1
chr11 62552941 62552944 0
chr11 62552944 62552949 1
chr11 62552949 62552950 0
chr11 62552950 62552960 1
chr11 62552960 62552961 0
chr11 62552961 62553028 1
chr11 62553028 62553030 0
chr11 62553030 62553049 1
chr11 62553049 62553050 0
chr11 62553050 62553212 1
chr11 62553212 122082543 0
Expected output:
chr11 0 62551499 0
chr11 62551499 62551595 1
chr11 62551595 62552139 0
chr11 62552139 62553212 1
chr11 62553212 122082543 0
* headesk *
Sorry, I got my logic the wrong way round. See updated patch.
ignoreLocalDeletions.3.patch.txt
$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01.sam
chrX 0 7384244 0
chrX 7384244 7384413 1
chrX 7384413 171031299 0
$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01_exon.sam
chr11 0 62551499 0
chr11 62551499 62551595 1
chr11 62551595 62552139 0
chr11 62552139 62553212 1
chr11 62553212 122082543 0