RepeatMasker icon indicating copy to clipboard operation
RepeatMasker copied to clipboard

Different -species lead to different result

Open chiu-shenpo opened this issue 4 years ago • 1 comments

If my genome is Tetrahymena(Eukaryote) genome, which species should I choose? Human or Tetrahymena??? I've tried both and if i chose -species tetrahymena, then the result wont annotate any LINE, LTR and SINE. But if i used human, it will generate some percentage of LINE,LTR and SINE. Why????and which one is correct???

chiu-shenpo avatar Jan 18 '21 08:01 chiu-shenpo

RepeatMasker ships with Dfam, an open database, which does not yet include any repeats for Tetrahymena nor any other Alveolates. It looks like the last RepBase RepeatMasker edition release also only contains a few Tetrahymena LINEs.

The elements you saw annotated when you set the species as "human" may be related to the true repetitive elements in Tetrahymena, but they could also be low-quality alignments and/or false positives. A better approach would be to obtain a library of Tetrahymena-specific and ancestral TE sequences and use the -lib option, or to create such as library starting with a de novo TE discovery tool (such as RepeatModeler). The results of de novo tools are generally okay for masking (with a risk of masking some non-TE sequence), but for accurate annotation you will want a manually curated library.

jebrosen avatar Jan 19 '21 19:01 jebrosen