EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

Benchmarking not as expected.

Open isabelladistefano opened this issue 11 months ago • 6 comments

Dear Shujun,

I hope you are well. When reading your benchmarking paper, “Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline” EDTA appears to do very well on TE prediction in the rice genome. 

For the purpose of our studies, we are benchmarking some TE tools including EDTA. We compare the output of EDTA to the the Published TAIR Transposable Elements of Arabidopsis thaliana chromosome 1.

This was the code for EDTA, the FASTA being the most recent TAIR genome assembly of Arabidopsis thaliana. perl $EDTA --genome $FASTA --cds $CDS --anno 1 --threads 32 --sensitive 1

https://www.arabidopsis.org/ - TAIR publishes 7135 Transposable elements in Arabidopsis thaliana Chromosome 1 

When intersecting the EDTA results with the TAIR results using

bedtools intersect -u -a TAIR_TEs.gff -b EDTA.anno.gff

There are only 3462 intersections, meaning the EDTA result is only representing 48.5% of the transposable elements in Arabidopsis thaliana chromsome 1.

 This is before looking at whether the classes/families are correct so far.

Please can you help us to find an explanation for this and/or improve the efficiency of EDTA so that we can use it to safely annotate TEs of other non-model brassicaceae species.



Best wishes,



Isabella

isabelladistefano avatar Jul 28 '23 13:07 isabelladistefano