spaln icon indicating copy to clipboard operation
spaln copied to clipboard

possible to combine exons of same gene on different scaffolds?

Open cistarsa opened this issue 7 years ago • 1 comments

Hello, I'm using a reference cDNA set of a gene family against a highly fragmented draft assembly of a closely related species and I'm getting hits on multiple contigs. For example, the gene family has 81 annotated genes in the reference assembly but SPALN finds 154 hits in the fragmented draft. A few of these genes are broken up by long intronic regions, which encompass multiple contigs. Here's one gene that has been matched to two different contigs:


# reference exons
>LOR3C  
ATGACTAAAGTCATTAATTATATGACGTTTTTCACTGTTCCTATATTCATACTCAGGGTGTTGGGCTT
TTGGAAATCTGATGAAAAAATGGGCTATAGCCATTTATACAATTTCTATACAATATTGGCAGTGATTTC
ATGGATACTGTTTTTGACATCCCAAGCCATATATTTGAGTATTTCTCTCAATGGTATTGGTGAATTGA
CTCTTATAGTTTTCACTGCCGTTACATATAGTGCAAATTTTATAAAGACAATCATAATGTACAGAAAAA
AGGCACTGATTGAAGCCTGGATGAATAAACTCAATCAACCAATTTTTCAACCCAAGTGCAAGGAACAT
TATAACATTGCACATTCAACCAAAAGAATGTATGATAACTTAATCTACAGTTTTTTATATTTTAGTTTAC
AAACTTACCTTTTCTTCGCTATACAACCATTTTTTCGACGTGAAAAGTCAATGTTGAGTGATGGTTGG
TTTCCGTGCGATTGGAAGATTTCCCCAAATTATGAAATAATTTTTACATTCCAAAACTTCGTCATTTTC
TCTAACACTTTCACATGTATAACTTTGGATATATTGTCTGCAGGTTTGTTGATTCAAATTGGTTTGCA
ATGCGACTATTTGGTCGTAACATTGAATTGCTTGGAGAAATTCTCTGTTGAGAATGGAATTTTACGAC
AGAGAGATGAATCGAGTGTCGATTTGAATGACAAGGATTTTCAAACATTCTCACGTGTAATGACTTCC
AATTTGATAGTATGTATCGAGCATTACAAACAGATAAAAACATTATCGAAGGAAATAGAAAAATTTCAC
GAAACAAGTATTTTTGTTCTATTTTTTGAAGGAGAAGTTTTGATTTGTTGCTCGTTATATCAGATGAG
TGTGGTTCCAATCATGACCATCGAATTTTTTATGGTGATCTCTTTCGTTGCAGCTGCACTCATGGAA
CAATTCGTATATTGCTGGTTTGGAAACGATATCATTCACAAGAGCTCCAAAATATCTGATGCTATTTA
TAACACGCCTTGGATTGAATGCGACTTGCAATACAAAAAAACTTTACTCAATTTTATGATTCAGACAAA
GTTCCCCATAGAGATTTGGGTCGGAAGATTATTTTCGATGTCGATTCCTGTTTTCAAATGGATTGTT
CAGTCTTCATATTCTGGTTTTGCTTTACTG

# two outputs 
>size4294.1 size4294 - [1:4294]  ( 2822 - 1 ) LOR3C + 1:1191  ( 1 - 1099 ) N   1578.60
;C complement(join(1..15,113..130,386..412,1490..1624,1849..1901,
;C 1975..2822))
ATGACTAAAGTCATTAATTATATGACGTTTTTCACTGTTCCTATATTCATACTCAGGGTG
TTGGGCTTTTGGAAATCTGATGAAAAAATGGGCTATAGCCATTTATACAATTTCTATACA
ATATTGGCAGTGATTTCATGGATACTGTTTTTGACATCCCAAGCCATATATTTGAGTATT
TCTCTCAATGGTATTGGTGAATTGACTCTTATAGTTTTCACTGCCGTTACATATAGTGCA
AATTTTATAAAGACAATCATAATGTACAGAAAAAAGGCACTGATTGAAGCCTGGATGAAT
AAACTCAATCAACCAATTTTTCAACCCAAGTGCAAGGAACATTATAACATTGCACATTCA
ACCAAAAGAATGTATGATAACTTAATCTACAGTTTTTTATATTTTAGTTTACAAACTTAC
CTTTTCTTCGCTATACAACCATTTTTTCGACGTGAAAAGTCAATGTTGAGTGATGGTTGG
TTTCCGTGCGATTGGAAGATTTCCCCAAATTATGAAATAATTTTTGCATTCCAAAACTTC
GTCATTTTCTCTAACACTTTCACATGTATAACTTTGGATATATTGTCTGCAGGTTTGTTG
ATTCAAATTGGTTTGCAATGCGACTATTTGGTCGTAACATTGAATTGCTTGGAGAAATTC
TCTGTTGAGAATGGAATTTTACGACAGAGAGATGAATCGAGTGTCGATTTGAATGACAAG
GATTTTCAAACATTCTCACGTGTAATGACTTCCAATTTGATAGTATGTATCGAGCATTAC
AAACAGATAAAAACGTAAGTACAGATAGTGTAAGTTTTGAAAAATGTTATATTTTTTTTT
ATCACAAGAACGCGCGTTTTTTTTTGTTGACTTTATTCATCAACATAAATCTCTATCGAG
ATTTTGTTATTTTCTGATTGACTTGTGTCCCATTGCGTTATCTTGATGGAACAACATTTT
TCTTTGTGGTCGTTTTTTTGCCATTTCAGTCTTCAAGAGCTTCAAACACGCTACATAATC
TGTTTTAAAAAAATCATTTGATTTTGCATCTTCAAATAGTAGAAAACCGAAAATGTAATG
ATTAATGTACATTTTG
>size2878.1 size2878 - [1:2878]  ( 2878 - 348 ) LOR3C + 1:1191  ( 101 - 1191 ) N    275.30
;C complement(join(348..383,1014..1186,1247..1544,1609..1669,
;C 1793..1835,1887..2026,2172..2258,2331..2471,2602..2642,
;C 2714..2723,2780..2806,2876..2878))
CTTTATCCGATATATGCAATTCTTGGAATCATTTAAGCAGTACTGGTTGTGGCGGCTTCT
CTTTCTAGTGAAGCTTTTGCTAGTCTTGATGATTCCTCTCTTAGTCGATTTATGGTGGTT
GGATTCCTGATCATCTTTCAAATCAATTTATATCATATTAGACAATAGAAGTCCGATATG
CACCACGAAACATTGTCAACCACCTTCTAACACATGTTTTTAGAAGATAAAAATCCTTAC
TGAAAAAAAATTTGAATGACGTCCTTTTCTTAATTTCGTGTTACCGAATTGAACTTTCTT
TTTTTTTCTCGCATTTCAAATAAATATTCCGTTCAGCTTCGATTAAATGAGTGCTGAATT
TCTGTTCAATTTTCCTAAATTTCCCAATCATTTCATCACTAACGGAAATAATTTAATTTT
CTTCAAAATTCGACATTTTCATCATTTTAGATGTGTATTCTGCAGTCTTGGGCTGAACTT
AAATTTTTCGAACCTCGCTATTGGGGTGGAATTAAATTTACTCAAATATTCCCCTTCATT
AAAACTCAGATGATACGTAGTTTTTTGTATAGAGTTTATGTGTTCCTGTTTATTATTCAC
AATGAAGTCACGAACAGTCCTGTAAAATGATTAAATTCAAGTAACACTAAAATAGAAGCA
AATATTTGAAAATGAATTCGAAAAACATTCGAAGCCACATAATTGTGTATGATTCTTATC
ACCAATGAATAGTAGCGGCTAGCATAGTAGAAGATGCAGTCGACGATGAAATGATAACTA
GAGATCAAATACATTTCCATAGATCATATATTATATCATCGAATACATAATTTTAATATT
TCTAAATTGGGAAAGTTCAATTTTCCAGAGCTCCAAAATATCTGATGCTATTTATAACAC
GCCTTGGATTGAATGCGACTTGCAATACAAAAAAACTTTACTCAATTTTATGATTCAGAC
AAAGTTCCCCATAGAGATTTGGGTCGGAAGATTATTTTCGATGTCGATTCCTGTTTTCAA
ATGGATTGTTCAGTCTTCATATTCTGGTTTTGCTTTACTG

besides manually combining these hits in a program like Geneious, is there a way of curating the output in SPALN?

cistarsa avatar Jan 24 '18 20:01 cistarsa

No, spaln has not a routine which facilitates genome assembly. But, you may concatenate or fuse the two contigs in the right orientation and then run spaln with –Q0~-Q3 option.

Osamu

ogotoh avatar Apr 16 '19 04:04 ogotoh