EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

No intact LTR's passed through in combine stage

Open mason-linscott opened this issue 2 years ago • 5 comments

Hi Shujun,

I noticed on my run through of EDTA 1.9.9 that none of the LTRs in the final combined library produced by EDTA had internal and LTR regions annotated for the same family. That is to say, each element had a unique TE number and had no INT or LTR regions had the same TE number as another.

To ensure this was not an error that was fixed in the most recent version, I then reran EDTA using the same outfiles produced in the raw stage (LTR, TIR, and Helitron) from my previous run but now using EDTA 2.0.0 (downloaded 1/22/22). The combine step just finished and I got nearly the same result (small number of increased TEs but no LTR's annotated with INT and LTR regions).

The output from the LTR stage indicates there are quite a few intact LTRs in my genome. Masking with EDTA and RepeatMasker2 confirms that LTRs dominate the genome (56% of the genome but this can include incomplete LTRs).

Is this expected behavior? I expected INT and LTR regions from the same element would have the same TE element ID.

Best, Mason

mason-linscott avatar Jan 22 '22 23:01 mason-linscott

Hi Mason,

In the final gff3 file you can search for "structural" and "LTR_retrotransposon" for structurally intact LTR elements. If the raw stage identifies intact LTR elements, they should be included in the final annotation.

Shujun

oushujun avatar Jan 24 '22 03:01 oushujun

Hi Shujun,

Thanks for the clarification. I was hoping intact LTR elements would be split into their constituent parts (INT and LTR) so I could investigate solo LTR formation using RepeatMasker output of the EDTA library using REannotate or another utility.

M.

mason-linscott avatar Jan 24 '22 18:01 mason-linscott

It is split. The library has LTR and INT separated. Maybe you want to share a sample of your result to be more clear.

Shujun

On Mon, Jan 24, 2022 at 1:27 PM mlinscott @.***> wrote:

Hi Shujun,

Thanks for the clarification. I was hoping LTR elements would be split into their constituent parts (INT and LTR) so I could investigate solo LTR formation using RepeatMasker output of the EDTA library using REannotate or another utility.

— Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/251#issuecomment-1020409532, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4ND6QDANITEAV77KCLLUXWKY5ANCNFSM5MSRTFNQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

oushujun avatar Jan 25 '22 03:01 oushujun

I apologize if I was not being clear. In the final library there are no elements with '_LTR' or '_INT' that share a common TE name. I thought this happened in the combine stage as that was where I noticed the name changed from 'scaffoldX:pos1...pos2' to 'TE_X_LTR#LTR'.

Library attached to inspect. idahoensis_ncbi_filtered.fa.mod.EDTA.TElib.fa.txt

mason-linscott avatar Jan 25 '22 03:01 mason-linscott

Hello @mason-linscott,

Sorry for the long delay. I finally got the chance to inspect your sequence and the codebase. Yes, this is the expected behavior. The code in EDTA should be able to pick up LTR sequences from different part of the structure, but due to the sequence naming scheme that LTR_retriever is using, which uses names that are based on the coordinate of the actual sequence, such that the LTR and INT were named differently. For example, before changing coordinate names to serial names, LTR sequences are named:

>Chr1:10077076..10077291_LTR#LTR/Gypsy
>Chr1:10077292..10081907_INT#LTR/Gypsy

You can tell the two coordinates are continuous and they are from the same element.

I add this as an enhancement and will see how future versions can address this.

Thank you, Shujun

oushujun avatar Feb 28 '22 04:02 oushujun