EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

Clarification on EDTA output: differences between `intact.fa`, `intact.gff3`, and `raw.fa`

Open gforg34 opened this issue 1 month ago • 0 comments

I’m working on detecting intact LTR retrotransposons in a specific chromosome using EDTA. The goal is to understand which output file represents the final set of intact LTRs for downstream analysis.

Here’s the command I used:

EDTA_raw.pl \
    --genome chromosome.fasta \
    --species others \
    --curatedlib curated_library.fa \
    --type ltr \
    --threads 40 \
    --overwrite 1

In the LTR output directory, I see three files:

*.LTR.raw.fa
*.LTR.intact.raw.fa
*.LTR.intact.raw.gff3

To check the results, I compared the number of candidates reported in the annotation file pass list versus the FASTA:

grep -c "repeat_region" *.pass.list.gff3   # returns thousands of candidates
grep -c "long_terminal_repeat" *pass.list.gff3 # returns thousands of candidates 
grep -c "^>" *.LTR.intact.raw.fa           # returns only 2 sequences and in the beginning of the chromosome (some kbp away)

My question:

  • Which of these files should be considered the final output for intact LTR detection?
  • How do raw.fa, intact.raw.fa, and intact.raw.gff3 differ in terms of filtering and intended use?

gforg34 avatar Nov 14 '25 14:11 gforg34