minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

Aligning short terminal exons

Open MichaelHiller opened this issue 1 year ago • 4 comments

Dear Heng and team,

we are aligning polished IsoSeq reads against genomes and I noticed a few cases where the last exon of 13 bp was softclipped and not aligned. However, the exon perfectly aligns, if one creates a ~370 bp intron with GTG ... AAG consensus splice sites.

Blat correctly places the terminal 13 bp exon (second block), while minimap2 stops the alignment at the end of the first block in this image: image

We are calling minimap2 even with minimap2 --eqx -a -c -t $nThreats -ax splice:hq -uf --secondary=no -C5 -o ${P_outMinimap2}/ALL.CuP.aln.sam --junc-bed ${ref_annot} -cs long ${genome_fa} ${P_out_isoC}/ALL.CuP.fasta.gz

Is there a way to increase sensitivity to correctly align such terminal exons? I would be happy to spend a more to get the right alignment.

Thx a lot Michael

MichaelHiller avatar Apr 03 '23 09:04 MichaelHiller

Do you have the sequences? It might be possible to tune parameters to get the alignment.

lh3 avatar Apr 03 '23 14:04 lh3

Sure, I attach the read and the genomic context around that gene. Running this reduced example gives the same alignment. read.fa.gz genome.fa.gz

MichaelHiller avatar Apr 03 '23 16:04 MichaelHiller

Thanks for the example. It is not possible to tune parameter to get the right alignment. I will keep this issue open and think more in future.

lh3 avatar Apr 08 '23 03:04 lh3

Alright, thanks a lot for looking into this.

Conceptually, HiSat2 (and other mappers?) make a list of downstream exon candidate positions that are inferred from mappings of other reads (in our case they are already given via --junc-bed ${ref_annot}) and then check if shorter terminal exons align to these candidates. This could be a postprocessing step for reads that have unaligning terminal parts.

MichaelHiller avatar Apr 08 '23 07:04 MichaelHiller