minimap2 icon indicating copy to clipboard operation
minimap2 copied to clipboard

missed first exon and incorrect alignment for two mouse genes

Open Magdoll opened this issue 3 years ago • 0 comments

I am running 2.17-r941 with the parameters -ax splice -t30 -uf --secondary=no

In the first case, the following sequence is missing exon 1 (shown as soft-clipped) when aligned to mm10 the eml1 gene. BLAT on the contrary was able to map exon 1. The UCSC screenshot is here: https://www.dropbox.com/s/krx655lv4lzg0s3/Screenshot%202020-08-05%2016.16.56.png?dl=0

>eml1_exon1_unaligned
GGGCCGCGGCGGCCTCGGCAGGGCCGCCAGTGTGTGGGGTGGCGGCGCGGGCCGGAGCGGGGCGCGGGGCCCGGCCCTCAGCATGGGAGGGACGGCTTTCTCCAGCTATAGCAGCCCTGTTTACGACCCACGTCCCTCTCGCTGCTGCCAGTTTCTGCAAAACGATTGACCAGCGCGTCCTTGCCTGCCAGCAGCATGGAGGTGGTCAGACCGCCATTCGCCTCTCTGGAGCAGCGCGGTCCAGATGCAGGAGGATGACATTCAAACTGCTCAAGTCAGCCGCTGGCCGACGTGGTTTCGGGAGGCTGAAACATCACGGAGGAGCCAAAACAGGCCTGTGTGCTCAAACAGGAAAAGGGGGACCCTACCAAAAGCGAGGCCCACTGGGGGGCAGGACCCCTGCCATTTGAGAAACCACCGGTCCAAACAATGGCCAACCGTGTTACCAAAGAAACCCCAGTGCCTCCCCTCCCGGCACCCTCCGGGGGCCCAGGAAAAGAAAGTAGTTTGTGCCGGTAAACCAAAAGCATCAATAGGAACCAGCTCTTCCGAAAAGAGTGTCTCCCAGGTGGCCCGGAGGGAGAGCTCTGGGGGGACTCCAAAAGGAAGCCGGGAAACCGCACGGGCTCAACCCAGTAAGCTCCTCCAGCGGCAAGAAGAACAGTGAGAGAGCAAACCCAAGGAGCCCCGCATTTCAGTCCAGAAAGAAAGGATATGTAAAAAATGTTTTTCTTCGAGGGCCGTCCGGTCCCCACCATGTACATGCCCCAAAGGACCAAGTGGATTCGTACAGTTTTGGAAGCAAAAAAGCTGAGTTACCAAACGAAAGCGGCTGAAACTGGAGGTGGGGTCTACGGGTTACCGGGGGGCCGAGACTGTCCGCAAATAACTTTGTACTTGCCTCCCGACCGGGGGAGACCGTGGTACTTCATTGCGTCTGTCGTGGTGCTTTTACAAATGTGGAGGGAGCAGCTGCAGAGGCATTACGCGGGGCCACAAACGATGACGTTCCAAGTTGCCTAGCGGTCCATCCCTGACAGGATCCACCATAGCAACGGGGAACAAAGTGGGCGGGCCACATCTAAGGATGGAAAAGCAACCTGCCAACCACATGTGCGCATCTGGGGACTCTGTGGACACTGAAACACTCTGCATGGTCCCATTGGAATAGGCCTTTTTTTTGACCGGGCCTGTCCACCTGCATCGCATTCTCAAAGGTCTAAACGGAGGAGGCCCATCTCTGTGCTGTGGGATGATTCCAAACGAATCACGTTGCCTGTCCCGTGTGGGGATTGGCAGAAAAAGAAGAGAGACTGGGCCGACGTGGAAGTTTGTTCGAACGAAAGCGGTAATTTGCGGCGGGAACTTCCACCCCCACAGACACCAATTATCATAGTCCCCACCTGCGGGAAAAGGTCCACACATCTCTACTTTTTGGACCCTAGAAGGAAAATTTCCCTTTAACAAGAAAGCAAGGGGTTGTTTTGAGAAAACAAGAGAAGCCAAAAGTTTTGTTTCTCTGCGTGGACGTTTTTTCTGGAAAATGGCGACACCATTTACTGGAGATTTCCAAGCGGGCAACATCTTGGTGTGGGGGGAAAAGGTACAAAATCGGATAAAGCTATGCAGTTTTCAAGGGGGCCCCACGAGGGCGGGCATTTTTTTGCACTTTTGTATGCTGAGAGACGGGGGACGCTGGGTGTCCCGGAGGAGGGAAGGACCGGGCGGACCTCATTTTCCCCTGGAACGGGAAAACTATCAAAAAAACTTCACAAAGCAGAGATTCCTGAGCAGTTTTTTGGCCCCAATACGGACAGTAGCCGAAGGGAAGGGCAAACGGTCCATCTTGATAGGCACTACTAGAAAACTTTTGTCCCTGCCAAGGGCAACCTTTGTCAGGGGGACTTTCACACCCATCACTCGGGGGCCCACACCGATGAGCTCTGGGGGCTAGGCCCATCCATGCCTCCAAAGCCCCTCAGTTCCTGGAACTTGTGGGGACCATGACAAACCATGCCCACTCTCTGGGGGACGCTGTCGGTCCACCGGGCCAGTCTGGGGACAAAAAATCATAGAGGATCCAGCTCCAGGTCCCCCTCTGGTTTTTTCATCCTTCCGGGGTCCTGTGGGTTGCAGTGGGGGGACACTGGACTGGGGAGGGTGGTTTGGTGTTTTGACACGGAGACAAAGGGGGGACTTGGTCCAACCGTCCCCCACACGGGATGGGAAATGAGCAGCTGTCCGTGGATTGCGGTTATTCTCCAGATGGGAAACTTCTTAGCAATCGGCCTCCCATGACAACTGCATCTACATATATGGAGTTTACCGGACAATGGAAGGAAGTACACACGAGTTTGGCAAGTTGCTCCCGGCCATTTCCAGCTTCATCACCCACTTGGGACTGGTCCCGTGAAACTCACAATTTCCTGGTGTCCAAATTTCCGGGGGACTACGAGATCCCTCTACTGGGTTTCCGTCCTGCCTGTAAAGCAAGGTCGTGAGTGTGGGAAAACCACAAGGGGACAATCGAGTGGGGCCCACCTATACCTGCACCTTGGGATTCCCACGTCTTTGGGAGTGTGGGCCGGAGGGCCCTCCGATGGGGGACAGACATCAACGCCCGTCTGCCGGGGCTTCACGAGAGAAAGCTCTTGTGCACAGGCGATGACCTTCGGCAAAAGTGCACCTCTTTCTCATACCCCGTGGCTCCACAGTTTCCGGGCTCCAAGCCACAATCTACAGTGGGGGACACAGCAGGCCCACGTCCCCACCAACCGTGGGACTTCCTCTGTGAAGGACAGCCACCTTAATCTCCACGGGTGGGGGAAAGACACAAGCATCATGCAGTGGGCGAGTCCATTTAGTCCCCTGTGGGAAGCCCGAGGGACGCCAGACTCGAGTTCCGCTTTGGTCCACTGTGATTTTCTGTTTTTTGTCCTACAGGGACTCTTAACAAAAACCTCAGGGAAAAACTGTCCCTCTACCAGTTTACCTTAGTTGGGGAAGCCAGTGCGTGTCACACCAGATAAAGCGGTTTGTGTGTGTCCCGCTTTTGTTATTATAGGGCAGGATAGAAATGCATGTCCGGGTTAAAGGAAGTCCCCAAGGTTTTTTACATGGCAGCAGAAGGGACTGGTGTATCCTTATAGGACACTTTTTCTATGAAACTCTTTCAAAAAATGGTCACAGGAAATGCCCTTTTAAAAAATACTGTATATAGTCTTCACTGCTTCACCTTGTTTAAGTCAGATATTTTATGATAATGAAAGTACGGAAAACTGGGGAAAACTGGGGGTCCCGTTGTTGGACTAATTGGGTCCTAAAGAGGATAAAATTATGTAAACTGATTTTGGCCCAGCTGAAATCAGGACTGCAAAATGCCCAGGCTTTCCCTTGGCCATGTATCTAAAAATCCCATAAAACCTCCCTCCCTTTGGGAGGGCTGGCAAAAAAAGGGGGGCTGTGCTTCTCCCTGGTTTAAGCAGTTTTGTTACTACAGAAACCCCCCGAAGCTGCTGTGGGTCCCTACAGGTGCTTTCCATGCTCTGGTTTAGACTGTTCCCCTGTCCTGAGGGACACAGCCAGTTTTTCTTCAGCACCTCCACAAAAACTGCACCCCCGTCCTCTGTCCATGCACCTCGATTCACGGCGAGGACATTTAAGCCACCACTCTCCTGGCGTATTGATGGCATTGATACGGTTATTGTCCCCTCGTATAGAGTTAAACAACTTACGATAAAATTTGCCCAAAGCTGGGGCCCTTGCTGTGTGTGCCTCTGGACACTGTACATTTTGTACCCAAAACCAAGTGGGTCCAAGTCGGAGAGGGGGACTCTTTCAGTATGGGAGGCCAGCCTGTGGGAGTGGCAGCGCCTTTGAGGCTCTGGGTGTAAGGACAGTTTTCCTTCCCTGGACTCTCGTGCACAGGACAGCATGCAGGCATTACAGACTGAAAACTGGTGCTCTGGCCGAGTAGAAAAAGTAGGTAGGGTCTGAAGGTGTCGAGAAGGGCCTTAACCTGTGGTGTGGGGACAGATTGAATTGATTGTTTTACACTGGGGGGACTGTATCTCGGATCTTTTAAAATAGAGGAAATCACAAAACAGGACTTAAGGGACAGATGCTGAGATTGCTTTTTTTGTAAACTCGTTTAAGCGAGTGAGTGAGTTTGAGTTTACCTGAAACTCTGTAGCACTGGGTTGTTTCATAGTGGATGAAGGGACAGCACTGCAGACATCTCCCCTTGCCATCTCTAGCCTGCCTGTGGAAGGAAAACAAACGTGGACCTCAAGATGAAGCTGTTTTTGTTATGTATCCTTATCAAATATATATTCTATAAGGAAAATAAAAATCTGAAAAGTG

In the second case, minimap2 was able to align the whole pcdha1 sequence, but aligned the first and second exons poorly with a lot more indels inserted - whereas BLAT aligned it at 100% identity with correct exon boundaries. (for the screenshot, focus on the bold "YourSeq" BLAT alignment as the correct one https://www.dropbox.com/s/gruwqhheeelg7tm/Screenshot%202020-08-05%2016.28.06.png?dl=0)

>transcript/5185 full_length_coverage=2;length=4965
GCTAGTCCGAATCGGAACAATGGCGGATGCAGTGGCGATGGACTAACGGATTAGAAGAATTCTCCTAGCTCTGAGAGAATCCCTAATCAGAACAAAGCACTGTGCACTTGAAATGGAATTTTCCTGGGGAAGTGGCCAGGAATCCCAGCGCTTGCTTCTTTCTTTTCTGCTTCTTGCAATCTGGGAGGCAGGGAACAGCCAGATCCACTACTCCATCCCTGAGGAGGCCAAACACGGCACCTTCGTGGGCCGCATCGCGCAGGACCTGGGGCTGGAGCTGACGGAGCTGGTGCCCCGCCTGTTCAGAGTGGCGTCCAAGGACCGCGGAGACCTTCTGGAGGTAAATCTGCAGAATGGCATTTTGTTTGTGAATTCTCGGATCGACCGGGAGGAGCTGTGCGGGCGGAGCGCGGAGTGCAGCATCCACCTGGAGGTGATCGTGGACAGGCCGTTGCAGGTTTTCCACGTGGAGGTGGAGGTGAGGGACATTAACGACAACCCTCCCAGGTTCCCAACAACACAAAAGAATCTGTTCATTGCAGAATCAAGGCCACTTGACACTTGGTTTCCACTAGAGGGCGCTTCAGACGCAGATATCGGAATCAATGCTGTACTGACTTACAGACTGAGTCCAAATGATTACTTTTCTTTGGAAAAACCATCCAACGACGAACGGGTAAAAGGTCTTGGACTTGTATTACGGAAATCTTTAGACCGGGAGGAAACTCCAGAGATAATTTTAGTGCTTACTGTCACGGACGGAGGAAAGCCAGAGCTGACCGGCAGTGTTCAGTTACTCATCACTGTGCTGGATGCCAATGATAATGCTCCAGTTTTTGACAGATCTCTGTATACCGTGAAATTACCAGAAAACGTTCCAAATGGGACATTGGTAGTCAAAGTCAATGCCTCAGATTTAGACGAAGGGGTAAATGGGGATATTATGTACTCATTTTCTACAGATATTTCACCAAATGTGAAATACAAATTCCACATAGACCCTGTTAGCGGAGAGATTATTGTAAAGGGATACATTGATTTTGAAGAATGCAAATCCTATGAAATTCTCATAGAGGGAATTGACAAGGGACAACTTCCACTCTCTGGGCACTGTAAAGTCATTGTACAAGTTGAAGACATCAACGATAATGTTCCAGAATTGGAATTCAAATCTCTATCACTTCCAATACGAGAGAATTCTCCAGTGGGCACTGTCATCGCACTCATTAGTGTGTCTGATCGGGACACGGGTGTCAACGGGCAGGTGACCTGCTCCCTGACAAGTCATGTCCCCTTCAAGTTGGTGTCCACATTCAAGAATTACTATTCGCTCGTGCTGGACAGCGCCCTGGACAGAGAGACAACAGCGGACTATAAGGTGGTGGTGACAGCGCGGGATGGGGGCTCTCCCTCGCTGTGGGCCACGGCTAGCGTGTCTGTTGAGGTTGCTGACGTGAACGACAATGCACCTGTGTTCGCGCAGCCCGAATACACGGTGTTCGTGAAGGAGAACAACCCGCCTGGTGCGCACATCTTCACGGTGTCAGCGATGGATGCGGACGCACAGGAGAACGCGCTGGTGTCCTACTCGCTGGTGGAGCGGAGGGTGGGCGAGCGCTTGCTGTCGAGCTATGTGTCTGTGCACGCGGAGAGCGGCAAGGTGTTCGCGCTGCAGCCTCTGGACCATGAGGAGCTGGAGCTGCTGCGGTTCCAGGTGAGCGCGCGGGATGCTGGTGTACCTGCCCTGGGCAGCAATGTGACTCTGCAGGTGTTTGTGCTGGACGAGAATGACAACGCGCCCACACTGCTGGAACCTGAGGCAGGAGTCTCTGGTGGAATCGTGAGCCGGTTGGTGTCCAGATCAGTGGGTGCAGGCCATGTGGTGGCTAAGGTGCGCGCGGTGGATGCAGACTCTGGCTATAATGCATGGCTCTCTTATGAGCTGCAATCGTCAGAAGGCAATTCCCGTAGCCTTTTCCGCGTAGGTTTGTATACGGGCGAGATTAGTACTACGCGCATACTGGATGAAGCAGATTCGCCACGTCAGCGCCTTCTGGTGCTGGTGAAGGACCATGGTGACCCAGCAATGATTGTTACCGCCACAGTGTTGGTGTCTCTGGTAGAGAATGGCCCGGTACCAAAGGCTCCATCGCGAGTATCCACGAGTGTCACACACTCTGAGGCGTCACTGGTGGATGTCAACGTGTACCTGATCATTGCCATCTGTGCAGTGTCCAGCCTGCTAGTGCTCACGCTGCTGCTGTACACAGCGCTGCGCTGTTCCACTGTCCCCAGTGAGAGCGTGTGCGGGCCTCCAAAACCGGTAATGGTGTGCTCCAGTGCAGTGGGGAGCTGGTCATACTCCCAACAAAGGAGGCAAAGGGTGTGCTCTGGGGAGTACCCACCTAAGACCGACCTCATGGCCTTCAGCCCCAGTTTATCTGATTCAAGGGACAGAGAGGATCAATTGCAGTCTGCAGAGGATTCCTCTGGAAAGCCCCGGCAGCCCAACCCTGACTGGCGCTACTCTGCCTCGCTAAGAGCAGGCATGCACAGCTCTGTGCACCTGGAGGAGGCTGGCATTCTACGGGCTGGTCCAGGAGGGCCTGATCAGCAGTGGCCAACAGTATCCAGTGCAACACCAGAACCTGAGGCAGGAGAGGTGTCCCCTCCGGTGGGCGCCGGTGTCAACAGCAACAGCTGGACCTTTAAATACGGACCAGGCAACCCCAAACAGTCCGGTCCCGGTGAGTTGCCAGACAAATTCATTATCCCAGGATCTCCTGCAATCATCTCCATCCGGCAGGAGCCTGCTAACAACCAAATTGACAAAAGCGATTTTATAACCTTCGGCAAAAAGGAGGAGACCAAGAAAAAGAAGAAAAAGAAGAAGGGTAACAAGACCCAGGAGAAAAAAGAGAAAGGGAACAGCACGACGGACAACAGTGACCAGTGAGGCCACCAAATGGAAACAAGCCACTTAGCCAGTTTTTGTAATAATGGCAAATCTCTCCCATGTAGCAACTCCCCGCTCCTTTCTCCTATGACATGAGCCCTCAGAAATCTGCAGAAAGTTCCCTGTGTCTGTCTTGATCGCATTTAACAGGTTTTGTCTTAAAAAGCTTTCCTAAGTCTGGTGTTAACTCTCTCTCTCTCCACTCTGGCTTGTTTTCAGAACCTAAAAAGCAGACCCAGGTTTCCTTTCTCCCCCGCCGCAAAGGAGAAGCTTCCCAGCCCCGCCAGTGAGAGTTGGACTCTCTGCCCTGTGCTTCAAGCATCCTGTCTTGATGATATTTGCAGGGCAGGCTGAAAGGTATTCAGGTTGAGCAGTTGGGTGTTTGTGGTCACTGGGTATGTGTGGCTACCAAGAGTGTTGGAGAGCCTGGTATTGGCTGGGATGGTCCAGATTAGACTAGTTAACACAAGGAGGGCTGGGGCTCAAAGGCACATCAACACCGGGAGTCTTTCATCTGGAGGGGGAAAATGTGAAACTTACAAGGACCAGACTTTCTCAATCTCTCAACTAGACATATGATGGCCATCCTCTAACAGACAAAACCATCCCCACCGGCAAAGCTTTAGGAGCCCCTCAAGTGTGCTGGCTATAACATCACTGTATTCAAAACCTGCAGTATGCACACGAGCCAGCAGTTCAAGCGTTTAACAAGAGGGTCGGCCAGGGCAACAGAAGCAGATCTGATGTGTTTCCTGTACACGTCCTTGTGCTCACGCTATTAAAAATTCTTTTGCACACAATGTTTATGAAAAGGTCTCATCCTTTTCCAACAACACATATGCAAAAGCAAAAGAAAAACCCAAGACCTCACTTTATGCTGTTTGTTGTTTGATAGATTTATTAAAAAGAAAAGAGAAAGTCGATAGCTATAAATCTTTAAAGGAATATGATGAATACAATCCCCCCAACCTTCCCTCAAAAGAGAATCCAGTCTACAGCCATTTGAAATGATCGTTGCTGCTACAGAAGTGCTTTAAGAGAATTGCCTGGAACATCTGTATTATCCCGGCCACCTGCCAATCACAGCTTTACTCTTTCAGGTCACTCTGGGGCTGCCTCTTGCATGTATTAATTACTAAATAGAAGGATCTTTCTCTCTTTTCTAAGAAAAATGATGTGCACTTTGATTACACAACCTTCTCTAACCCACGATATCAAGACCCAGAAACTGAAGAAAAATCTTGTTTTCTCATGCATACAGTGAGCAGACTTTTCATTCCTCTGGTTCTGTGGTTGTCTCGGTGTGCTAGCCTACACCTCCACCTTGTTTAGCTTTCCTTTTCTAGAACACTCTGAATTGCTAACCTTACTAACACCTATGATGTTACCTGAATCAATCTCCCATATGTATGCTGTATGCTATTATAAGACTCCTGAGATATACTTACTCTGTGCTTGTGTATGTGAATGTTAATGCAACTATTACCTAGAGTGAACTTTAAGCTTTATTGTTGAATGTAAGCCCATTATATTTCCTTTTGTACACCTGTGGAAAAGTGGAATAGTGTTTTTTTTAAAACCATTGTTAATCAGCTTTTGTGTATGAAAGACACAGTAAAATTTCTTTCTTAAATCAAGATGCTGGTGATTCAAGGAATTTTATTTATGGTCAGCCAAGGGCTGTCTCTTGCCAAGAATCCTGCTGGCAAGGGAAAATGGATAAAGCTGGTTTTTTTTTTTCCTAGTAAGAATTCTGGAATAAATACTGAAGAAAGTCCCTGAGGGTATGCAAGCACAAAATTGTACCAATCTGACCTCTTTGAATTTGCAGACTGCTTTGAAATGCTATCCGGAATATCAGCTTGTAGAAAGTAATAAAATTTACTGTTACCATAAATAAGACATTTTAAGTTTATTGTGCACAACTTAGATGTTTGATTAATTATATTATCTACTTAAAAGCATATAAAAGAGGTAGGAGTCTGTTTTAAAAGGCATAAAAAATCTCT

Your insight is appreciated! -Liz

Magdoll avatar Aug 05 '20 23:08 Magdoll