EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

Negative coordinates in TEanno.gff3

Open nhartwic opened this issue 2 years ago • 3 comments

Basically the title. Here is the weird lines from the gff3 file...

15593   EDTA    repeat_region   -2      3598    .       ?       .       ID=repeat_region_23903;Name=TE_00012818;Classification=LTR/unknown;Sequence_ontology=SO:0000657;ltr_identity=0.9733;Method=structural;motif=TGCA;tsd=TTAAT
15593   EDTA    target_site_duplication -2      2       .       ?       .       ID=lTSD_23903;Parent=repeat_region_23903;Name=TE_00012818;Classification=LTR/unknown;Sequence_ontology=SO:0000434;ltr_identity=0.9733;Method=structural;motif=TGCA;tsd=TTAAT

...I've never seen negative coordinates like this before in any of my other EDTA runs. I'm not really sure what this is supposed to mean, but my downstream tools really don't like it.

I'm currently running EDTA version 1.9.6.

Let me know if there are any files I can send to try to figure out what happened here. In the mean time, I've noticed that EDTA 2.0 released a few months ago, so I suppose I'll update. As to this specific output, I'm just going to manually edit the gff3 to fix this entry and move on with life.

nhartwic avatar Apr 14 '22 02:04 nhartwic

Hi @nhartwic,

It looks like a bug. Can you send the contig sequence 15593 to my email [email protected]? Thanks!

Shujun

oushujun avatar Apr 16 '22 15:04 oushujun

Apologies for the delay on this. Got sidetracked.

EtweTM011.v2.15593.fasta.gz EtweTM011.v2.fasta.mod.EDTA.TElib.fa.gz

Here is the contig and the repeat library that EDTA generated for whole assembly.

nhartwic avatar Apr 22 '22 21:04 nhartwic

Hello @nhartwic,

Sorry for the long overdue. This issue originated in LTR_retriever for LTR candidates found at the boundary of sequences (i.e., contig 15593 in your case). LTR_retriever needs to extract 50bp flanking the candidate for further analysis. The element in your case starts at position 6 of contig 15593, leaving insufficient flanking sequence for the program and thus producing erroneous results. I have set filters to remove cases like these because they could not provide sufficient flanking sequences for LTR_retriever to determine the authenticity of the candidate. The update is reflected in this commit: https://github.com/oushujun/LTR_retriever/commit/4039eb7778fd9cbc60021e99a8693285e0fa2daf.

You may manually remove such cases or rerun LTR_retriever on EDTA/raw using the latest version on github. Note that the conda version is lagging and not as new as the github version.

Hope this helps! Sorry again for the delay. Please let me know if you have further questions.

Best, Shujun

oushujun avatar Jan 07 '24 06:01 oushujun