EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

about panTE library

Open zhangya19 opened this issue 2 years ago • 3 comments

Hi, Mr. Ou The proportion of repetitive sequences in the cotton genome annotated using the panTE library was 10% higher than that of EDTA. The results of this panTE library annotation were reintegrated with EDTA. What is the reason for the high annotation results?

Detailed annotation results: EDTA annotation pan-TE sum EDTA &repeatmasker Count bpMasked %masked Count bpMasked %masked LTR/Copia A1_WHU 81787 40900549 2.75% 90553 47504505 3.19% LTR/Gypsy A1_WHU 669523 617192488 41.46% 768844 713517587 47.94% LTR/unknown A1_WHU 634198 226442348 15.21% 748767 262333350 17.62% TIR/CACTA A1_WHU 48448 23769576 1.60% 51349 24166960 1.62% TIR/Mutator A1_WHU 173759 60308660 4.05% 184021 62247740 4.18% TIR/PIF_Harbinger A1_WHU 22639 6517328 0.44% 24283 6783755 0.46% TIR/Tc1_Mariner A1_WHU 7400 3627670 0.24% 7797 3695089 0.25% TIR/hAT A1_WHU 40073 10722063 0.72% 43269 11576977 0.78% nonTIR/helitron A1_WHU 3280 1020938 0.07% 3587 1081272 0.07% all A1_WHU 1681107 990501620 66.54% 1922470 1132907235 76.11%

zhangya19 avatar Jun 10 '22 09:06 zhangya19

Hello @zhangya19,

Could you paste the commands you used to obtain this result? thanks!

Shujun

oushujun avatar Jun 10 '22 14:06 oushujun

commands: RepeatMasker -lib Cotton_15genomes.panTE.fl3cp.lib.fa -pa 50 -gff genome.fa -cutoff 225 perl /public/agis/huguanjing_group/zhangya02/software/EDTA/EDTA.pl --genome genome.fa --cds genome_gene_cds.fa --curatedlib ../Cotton_15genomes.panTE.fl3cp.lib.fa --rmout repeatmasker.out --overwrite 0 --step anno --anno 1 --evaluate 1 -t 30

I reset the parameters and the result is still very high. commands: RepeatMasker -lib Cotton_15genomes.panTE.fl3cp.lib.fa -pa 50 -gff genome.fa -cutoff 225 -dir result -q -no_is -norna -nolow -div 40 perl /public/agis/huguanjing_group/zhangya02/software/EDTA/EDTA.pl --genome genome.fa --cds genome_gene_cds.fa --curatedlib ../Cotton_15genomes.panTE.fl3cp.lib.fa --rmout repeatmasker.out --overwrite 0 --step anno --anno 1 --evaluate 1 -t 30

zhangya19 avatar Jun 13 '22 02:06 zhangya19

You will expect a higher percentage for pan-EDTA annotation comparing to EDTA. This is partially the benefit of using multiple genomes to construct the library, so that it can be more sensitive. On the other hand, you may also have slightly higher false annotation due to combining multiple TE libraries, which is kind of like the additive effect. The pan-EDTA module tried to have false positives controlled, but it's still likely the false positive rate will be higher than single-genome EDTA.

Best, Shujun

oushujun avatar Jun 17 '22 05:06 oushujun