RepeatMasker
RepeatMasker copied to clipboard
custom library - no species option
I want to ask when using a custom library (generated from Repeat Modeler), is the repeat masker only masking the repeats from the custom library? or it also makes use of dfam+repbase library? I also tried the species option with my custom library and the program said it is not allowed.
So in case if it's only masking using the custom library, I see I can merge the Repbase species-specific repeats which I want to mask. But the post (https://github.com/rmhubley/RepeatMasker/issues/113) says However, by doing this you may lose out on some of RepeatMasker's enhancements which improve repeat annotation in some species, especially humans, mice, and mammals. So is the case with using any custom library ?
Also if i use famdb.py to extract vertebrate specific repeats. .. they will include repeats from both Dfam and Repbase libraries, as i configured repeatmasker with both of them?
Thanks
Yes, the -lib
option for custom FASTA/HMM libraries is exclusive with the -species
option for the installed libraries; only one or the other will be used for masking at a time. The species-specific enhancements are also only available with the -species
option and not -lib
.
However, another option is to do two rounds of masking instead of combining the libraries; see also https://github.com/rmhubley/RepeatMasker/issues/5#issuecomment-392877654. This might avoid some of your concerns.
famdb.py
will query whichever libraries are installed and configured. If RepBase RepeatMasker Edition was installed, the famdb.py info
command should report that the repeat database is with RBRM
to confirm this.
I hope this information helps to answer your questions!
Thanks, I followed your suggestion for repeat masking in two rounds ... I did round 1 with the species option, and used the masked genome as input for another round with my specific library. I was wondering how can I combine the .tbl option of the two respective outputs and how will I get the total number of the masked genome and the percentage of each repeat ?? Thanks
Also, I have compared the two masked files, I see some sequences that were masked in the first round(with specie) were unmasked in the second round (with the custom library). I was assuming the masked sequence from the first sequence will remain as such. Please let me know.
Also is the masked genome from the second round can be used for genome annotation? or it will be missing the masking from the first round and I will have to do any additional step to combine the masking from the two rounds?
Just to be more clear:
Results from round 1: (species specific[in this specific case its teleost])
file name: v2.fa
sequences: 26
total length: 606289673 bp (606099673 bp excl N/X-runs)
GC level: 40.99 %
bases masked: 190078452 bp ( 31.35 %)
Results from round 2: repeat modeler lib
file name: v2.fa.masked
sequences: 26
total length: 606289673 bp (606099673 bp excl N/X-runs)
GC level: 40.99 %
bases masked: 142282936 bp ( 23.47 %)
I was assuming masking would be commutative as per suggestions to do it serially...
Just to be more clear:
Results from round 1: (species specific[in this specific case its teleost])
file name: v2.fa sequences: 26 total length: 606289673 bp (606099673 bp excl N/X-runs) GC level: 40.99 % bases masked: 190078452 bp ( 31.35 %)
Results from round 2: repeat modeler lib
file name: v2.fa.masked sequences: 26 total length: 606289673 bp (606099673 bp excl N/X-runs) GC level: 40.99 % bases masked: 142282936 bp ( 23.47 %)
I was assuming masking would be commutative as per suggestions to do it serially...
Hey @minhasbushra, did you finally understood what happened to your genome ? Is it like remasked "from scratch" when using your own repeats library or do the two rounds actually results in a different masking than using only lib from the start ?
Thanks to anyone in advance for reading and potential answer !
Dear Kevin, Many thanks for your kindness.
-
RepeatModeler is a de novo transposable element family identification but it will not produce GFF file.
-
RepeatMasker is based on using query species (using “-species” command), or custom library (using “-lib” command).
-
In order to generate a custom library for -lib command, we can use the output file families.fa generated by RepeatModeler.
-
My question is that: Will your pipeline TransposonUltimate use file “-families.fa” in the final annotation?. I think the pipeline used the file “-families.fa”, but i am not sure.
If TransposonUltimate use file families.fa to generate to final results, that will be good.
- Moso bamboo is closer to rice and maize.
6.Thus, I am planning to run with both rice and maize, and then combine it before running this command , reasonaTE -mode parseAnnotations -projectFolder workspace -projectName testProject
But I do not know that what are the files need to combined or all output files of RepeatMasker. Thank you in advance for your kindness, please give your suggestion.
with regards Ramky