bedtools2 icon indicating copy to clipboard operation
bedtools2 copied to clipboard

cDNA sequences / genomecov: ignore local deletions, split on large deletions / exon skips

Open gringer opened this issue 5 years ago • 3 comments

this comment states "we always want to split blocks when a D CIGAR op is found". I have a situation where I don't want that to happen.

I'm mapping nanopore cDNA reads to the genome, and would like to preserve large exon skips (represented by minimap2 split reads in BAM files as 'N'), but ignore local deletions (represented as 'D'), because they're not particularly interesting in nanopore reads due to the abundance of deletions as part of base calling error.

Could an option please be added to genomecov to do this:

bedtools genomecov -strand '-' -bga -split -ignoreD -ibam input.bam > output.bg

My guess is that this fix would involve something like the attached patch. ignoreLocalDeletions.patch.txt

gringer avatar Oct 14 '20 02:10 gringer

Here's an example SAM file [attached] mapped_BC01.sam.txt which should appear in the bedgraph file as a single interval. My attempted modification using the above patch produces incorrectly split output with this read:

$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01.sam 
chrX    0       7384244 0
chrX    7384244 7384283 1
chrX    7384283 7384285 0
chrX    7384285 7384288 1
chrX    7384288 7384289 0
chrX    7384289 7384301 1
chrX    7384301 7384302 0
chrX    7384302 7384413 1
chrX    7384413 171031299       0

Expected output:

$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01.sa
chrX    0       7384244 0
chrX    7384244 7384413 1
chrX    7384413 171031299       0

gringer avatar Oct 14 '20 04:10 gringer

Here's another example SAM file [attached] mapped_BC01_exon.sam.txt which should appear as two intervals. This is also split at deletion points:

$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01_exon.sam
chr11   0       62551499        0
chr11   62551499        62551595        1
chr11   62551595        62552139        0
chr11   62552139        62552148        1
chr11   62552148        62552149        0
chr11   62552149        62552152        1
chr11   62552152        62552153        0
chr11   62552153        62552154        1
chr11   62552154        62552156        0
chr11   62552156        62552252        1
chr11   62552252        62552253        0
chr11   62552253        62552417        1
chr11   62552417        62552418        0
chr11   62552418        62552432        1
chr11   62552432        62552433        0
chr11   62552433        62552536        1
chr11   62552536        62552537        0
chr11   62552537        62552613        1
chr11   62552613        62552614        0
chr11   62552614        62552659        1
chr11   62552659        62552660        0
chr11   62552660        62552708        1
chr11   62552708        62552709        0
chr11   62552709        62552903        1
chr11   62552903        62552905        0
chr11   62552905        62552941        1
chr11   62552941        62552944        0
chr11   62552944        62552949        1
chr11   62552949        62552950        0
chr11   62552950        62552960        1
chr11   62552960        62552961        0
chr11   62552961        62553028        1
chr11   62553028        62553030        0
chr11   62553030        62553049        1
chr11   62553049        62553050        0
chr11   62553050        62553212        1
chr11   62553212        122082543       0

Expected output:

chr11   0       62551499        0
chr11   62551499        62551595        1
chr11   62551595        62552139        0
chr11   62552139        62553212        1
chr11   62553212        122082543       0

gringer avatar Oct 14 '20 04:10 gringer

* headesk *

Sorry, I got my logic the wrong way round. See updated patch.

ignoreLocalDeletions.3.patch.txt

$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01.sam
chrX    0       7384244 0
chrX    7384244 7384413 1
chrX    7384413 171031299       0

$ bedtools genomecov -bga -strand '+' -split -ignoreD -ibam mapped_BC01_exon.sam
chr11   0       62551499        0
chr11   62551499        62551595        1
chr11   62551595        62552139        0
chr11   62552139        62553212        1
chr11   62553212        122082543       0

gringer avatar Oct 14 '20 05:10 gringer