EDTA with TIR/Sola2 TE_Sorter chokes?

with TIR/Sola2 TE_Sorter chokes?

Open oliviamr opened this issue 3 years ago • 8 comments

Hi Shujun,

Thank you for EDTA it is a nice tool.

I have run EDTA on a large genome split in chunks. One of the chunk ran EDTA_raw.pl without an issue. Now when the last I am doing homology-based annotation of TEs I ran into this:

Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Use of uninitialized value in pattern match (m//) at /EDTA_p/EDTA/util/call_seq_by_list.pl line 88. Use of uninitialized value $chr_pre in hash element at /EDTA_p/EDTA/util/call_seq_by_list.pl line 90. Use of uninitialized value $pos in pattern match (m//) at /EDTA_p/EDTA/util/call_seq_by_list.pl line 100. Use of uninitialized value $pos in concatenation (.) or string at /EDTA_p/EDTA/util/call_seq_by_list.pl line 103. ERROR: Can not recognize this MSU position in the list! ERROR: TE annotation stats results not found in genome.2.fasta.mod.EDTA.TE.fa.stat!

Any suggestions on how to overcome this?

Apr 06 '21 16:04 oliviamr

should I do the same as per issue #151 ?

Apr 06 '21 17:04 oliviamr

Yes, it's the same situation.

On Wed, Apr 7, 2021 at 1:03 AM Olivia Mendivil Ramos < @.***> wrote:

should I do the same as per issue #151 https://github.com/oushujun/EDTA/issues/151 ?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/178#issuecomment-814281689, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NHMKLTHUBD3DLWR4ALTHM5HBANCNFSM42PCAWIQ .

Apr 07 '21 09:04 oushujun

Ok, more into this warning that leaves me slightly confused as when I ran the annotation (--anno 1 --step anno )

Apart from the warning above I get: Use of uninitialized value $type in concatenation (.) or string at /TREES_2020/EDTA_p/EDTA/util/gff2bed.pl line 84, <GFF> line Which stops from obtaining a *.TEanno.sum file Same as #171

Any suggestion to what to do?

Jun 17 '21 20:06 oliviamr

Can you rerun from beginning and see if it still has the error?

On Fri, Jun 18, 2021 at 4:45 AM Olivia Mendivil Ramos < @.***> wrote:

Ok, more into this warning that leaves me slightly confused as when I ran the annotation (--anno 1 --step anno )

Apart from the warning above I get: Use of uninitialized value $type in concatenation (.) or string at /TREES_2020/EDTA_p/EDTA/util/gff2bed.pl line 84, line Which stops from obtaining a *.TEanno.sum file Same as #171 https://github.com/oushujun/EDTA/issues/171

Any suggestion to what to do?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/178#issuecomment-863551818, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NBBW3E3YMZZYPJW723TTJNFBANCNFSM42PCAWIQ .

Jun 18 '21 07:06 oushujun

@oliviamr can you provide reproducible sample data for me to test with? Thanks! - Shujun

Jun 19 '21 04:06 oushujun

Hi Shujun,

Thank you for following up. Let me put you in context:

Since I am doing a 10Gb> plant, I followed your recipe noted in here: https://github.com/oushujun/EDTA/issues/61#issuecomment-721003071

The final step runs so that I get the folder *.fasta.mod.EDTA.anno, the files *.fasta.mod.MAKER.masked and *.fasta.mod.EDTA.TEanno.gff3 but the file *.mod.EDTA.TEanno.sum is empty throwing the error after using this --anno 1 --step anno --> /TREES_2020/EDTA_p/EDTA/util/gff2bed.pl line 84

what is the best way to send you sample data? and what is a suitable sample size?

Jun 19 '21 15:06 oliviamr

Also question: is it important that CDS are high quality (no frameshifts included) for the annotation or somewhere else in the pipeline?

Jun 19 '21 16:06 oliviamr

@oliviamr yes, otherwise you may include too many TEs in the CDS and thus remove too many TEs with these CDS sequences.

Jun 22 '21 10:06 oushujun

EDTA EDTA copied to clipboard

with TIR/Sola2 TE_Sorter chokes?

EDTA
EDTA copied to clipboard