EDTA
EDTA copied to clipboard
Differences in masking %. RepeatMasker vs EDTA using repeat library produced by EDTA.
Hello, Shujun!
I successfully ran EDTA on a genome with a sensitive setting. In the log file EDTA printed:
TE annotation using the EDTA library has finished! Check out:
Whole-genome TE annotation (total TE: 16.73%): some_species.fasta.mod.EDTA.TEanno.gff3
Low-threshold TE masking for MAKER gene annotation (masked: 1.00%): some_species.fasta.mod.MAKER.masked
I thought that 1% is a bit too low, so I decided to re-run RepeatMasker with the library produced by EDTA. I used -xsmall -nolow options to later use the soft-masked genome sequence in BRAKER2.
For some reason though RepeatMasker masked 24.43 % this time.
What is the reason for that? Should I be worried about the output?
Hello,
That's good news! Did you run through the unfinished RepeatModeler run? How Did you maje it?
For masking differences, you may search other issues for similar discussions. Please let me know if you have any other questions.
Best, Shujun
On Fri, Feb 11, 2022 at 8:49 AM d00bin @.***> wrote:
Hello, Shujun!
I successfully ran EDTA on a genome with a sensitive setting. In the log file EDTA printed:
TE annotation using the EDTA library has finished! Check out: Whole-genome TE annotation (total TE: 16.73%): some_species.fasta.mod.EDTA.TEanno.gff3
Low-threshold TE masking for MAKER gene annotation (masked: 1.00%): some_species.fasta.mod.MAKER.masked
I thought that 1% is a bit too low, so I decided to re-run RepeatMasker with the library produced by EDTA. I used -xsmall -nolow options to later use the soft-masked genome sequence in BRAKER2.
For some reason though RepeatMasker masked 24.43 % this time.
What is the reason for that? Should I be worried about the output?
— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/254, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NCXZROBNZHDGG2JB23U2UHWZANCNFSM5OEKPDZA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you are subscribed to this thread.Message ID: @.***>
Did you run through the unfinished RepeatModeler run? How Did you maje it?
Actually, I just re-ran the final step with --step final option and it worked this time.
For masking differences, you may search other issues for similar discussions.
I can't really find related topics in the issues. I saw some where people wonder about the differences between RepeatMasker with RM libraries and EDTA de-novo.
But I first created the repeat library with EDTA, and then used it to mask the genome. For some reason I get different results (16.73% from EDTA sum vs 24.43 % from RepeatMasker out), even though I assumed that EDTA used RepeatMasker to mask the genome in the end. Am I correct about this assumption?
I just would like to have a general statistics about repeat content and I don't know which number to trust 16.73% or 24.43 %.
Hello @d00bin,
Sorry for the delay. First of all, Low-threshold TE masking for MAKER gene annotation (masked: 1.00%): some_species.fasta.mod.MAKER.masked
this information says the file some_species.fasta.mod.MAKER.masked
is for MAKER gene annotation, not representing the actual TE content. Is it confusing?
If everything ran without error, the EDTA sum file represents what the program believes the TE content of the genome.
16.73% from EDTA sum vs 24.43 % from RepeatMasker out
This does represent some significant differences. Can you paste here the Repeatmasker command you were using?
Shujun
@d00bin do these issues resolved?
@oushujun Dear Shujun, I'm terribly sorry for such a delayed response!
Nope the issue is still there.
The command I used for RepeatMasker is:
RepeatMasker \
-a -gff -pa 32 -u \
-dir final_RepeatMasker_out \
-xsmall \
-nolow \
-lib /path/to/EDTA/library.fasta.mod.EDTA.TElib.fa \
genome_chromosomelevel.fasta
Low-threshold TE masking for MAKER gene annotation (masked: 1.00%): some_species.fasta.mod.MAKER.masked this information says the file some_species.fasta.mod.MAKER.masked is for MAKER gene annotation, not representing the actual TE content. Is it confusing?
And this I understand, yes.
Is this a non-plant?
On Wed, Apr 6, 2022 at 12:49 AM d00bin @.***> wrote:
@oushujun https://github.com/oushujun Dear Shujun, I'm terribly sorry for such a delayed response!
Nope the issue is still there.
The command I used for RepeatMasker is:
RepeatMasker
-a -gff -pa 32 -u
-dir final_RepeatMasker_out
-xsmall
-nolow
-lib /path/to/EDTA/library.fasta.mod.EDTA.TElib.fa
genome_chromosomelevel.fastaLow-threshold TE masking for MAKER gene annotation (masked: 1.00%): some_species.fasta.mod.MAKER.masked this information says the file some_species.fasta.mod.MAKER.masked is for MAKER gene annotation, not representing the actual TE content. Is it confusing?
And this I understand, yes.
— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/254#issuecomment-1089941391, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NEIL6FLLCLBXPKXENDVDU6YRANCNFSM5OEKPDZA . You are receiving this because you were mentioned.Message ID: @.***>
Is this a non-plant?
Yes. It's a teleost fish. There is a genome of a sister species, from the same genus, published and the repeat content is ~23%. Also, previously I used this workflow to produce a repeat library for my genome, and it ended up around ~23%. But EDTA is a much more elegant solution than what I used before.
That make sense to me. If EDTA produced no error, then it's running as expected.
On Wed, Apr 6, 2022 at 1:23 AM d00bin @.***> wrote:
Is this a non-plant?
Yes. It's a teleost fish. There is a genome of a sister species, from the same genus, published and the repeat content is ~23%. Also, previously I used this https://github.com/uio-cels/Repeats workflow to produce a repeat library for my genome, and it ended up around ~23%. But EDTA is a much more elegant solution than what I used before.
— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/254#issuecomment-1089984817, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NEZX7ZGTB2HEZVD5IDVDVCZRANCNFSM5OEKPDZA . You are receiving this because you were mentioned.Message ID: @.***>
That make sense to me. If EDTA produced no error, then it's running as expected.
So this difference between 16.73% from EDTA sum vs 24.43 % from RepeatMasker out is due to my repeat masker settings? And if yes, then what should I consider as "true" TE content of the genome?
You may want to manually collect some SINE LINE sequences and give it to EDTA. These Could be missed.
Shujun
On Wed, Apr 6, 2022 at 8:26 AM d00bin @.***> wrote:
That make sense to me. If EDTA produced no error, then it's running as expected.
So this difference between 16.73% from EDTA sum vs 24.43 % from RepeatMasker out is due to my repeat masker settings? And if yes, then what should I consider as "true" TE content of the genome?
— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/254#issuecomment-1090402688, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NFOPKDZ5XI7PFILYVTVDWULLANCNFSM5OEKPDZA . You are receiving this because you were mentioned.Message ID: @.***>