EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

solve '*.mod.EDTA.TEanno.sum' empty

Open ZJin2021 opened this issue 1 month ago • 3 comments

Hi shujun! I have some genome mod.EDTA.TEanno.sum files that are empty, so I checked part of the EDTA process. I found that buildSummary.pl script at line 560 will terminate when reading the .mod.EDTA.TEanno.out file if $type is missing. I checked the .mod.EDTA.TEanno.split.bed file, and the repeat sequence missing the $type is marked as "snRNA". I don't know if there are other types of repeat sequences that may lack the $type tag. What about changing the "die" at line 560 of the buildSummary.pl script to "next"? I am not sure if this will affect subsequent processes.

perl EDTA-master/util/buildSummary.pl -maxDiv 40 -stats $genome.mod.stats $genome.mod.EDTA.TEanno.out > $genome.mod.EDTA.TEanno.sum 2> out.log

out.log

This out line is the first instance of the change:
10000 0.001 0.001 0.001 scaffold398 121760 121980 NA + TE_00000280_INT LTR/unknown
missing type for TE_00002102 ... <>

```mod.EDTA.TEanno.out`

10000 0.001 0.001 0.001 Chr4 13998493 14000579 NA + TE_00001932_INT LTR/unknown
10000 0.001 0.001 0.001 Chr4 14000611 14000691 NA + TE_00002102 
10000 0.001 0.001 0.001 Chr4 14000769 14001580 NA + TE_00001932_INT LTR/unknown

.mod.EDTA.TEanno.split.bed

Chr4    13997694        13998492        TE_00000112_INT LTR/Copia       homology        0.709   4196    -       .       ID=TE_homo_82343;sequence_ontology=SO:0002264;ID=TE_homo_89176;sequence_ontology=SO:0002264
Chr4    13998493        14000579        TE_00001932_INT LTR/unknown     homology        0.9     9217    +       .       ID=TE_homo_82342;sequence_ontology=SO:0000186;ID=TE_homo_89175;sequence_ontology=SO:0000186
Chr4    14000611        14000691        TE_00002102     snRNA   homology        0.839   458     +       .       ID=TE_homo_82344;sequence_ontology=SO:0000274;ID=TE_homo_89177;sequence_ontology=SO:0000274     

ZJin2021 avatar May 05 '24 15:05 ZJin2021