EDTA
EDTA copied to clipboard
with TIR/Sola2 TE_Sorter chokes?
Hi Shujun,
Thank you for EDTA it is a nice tool.
I have run EDTA on a large genome split in chunks. One of the chunk ran EDTA_raw.pl without an issue. Now when the last I am doing homology-based annotation of TEs I ran into this:
Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Warning: TIR/Sola2 not found in the TE_SO database, will use the general term 'repeat_region SO:0000657' to replace it. Use of uninitialized value in pattern match (m//) at /EDTA_p/EDTA/util/call_seq_by_list.pl line 88. Use of uninitialized value $chr_pre in hash element at /EDTA_p/EDTA/util/call_seq_by_list.pl line 90. Use of uninitialized value $pos in pattern match (m//) at /EDTA_p/EDTA/util/call_seq_by_list.pl line 100. Use of uninitialized value $pos in concatenation (.) or string at /EDTA_p/EDTA/util/call_seq_by_list.pl line 103. ERROR: Can not recognize this MSU position in the list! ERROR: TE annotation stats results not found in genome.2.fasta.mod.EDTA.TE.fa.stat!
Any suggestions on how to overcome this?
should I do the same as per issue #151 ?
Yes, it's the same situation.
On Wed, Apr 7, 2021 at 1:03 AM Olivia Mendivil Ramos < @.***> wrote:
should I do the same as per issue #151 https://github.com/oushujun/EDTA/issues/151 ?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/178#issuecomment-814281689, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NHMKLTHUBD3DLWR4ALTHM5HBANCNFSM42PCAWIQ .
Ok, more into this warning that leaves me slightly confused as when I ran the annotation (--anno 1 --step anno )
Apart from the warning above I get: Use of uninitialized value $type in concatenation (.) or string at /TREES_2020/EDTA_p/EDTA/util/gff2bed.pl line 84, <GFF> line Which stops from obtaining a *.TEanno.sum file Same as #171
Any suggestion to what to do?
Can you rerun from beginning and see if it still has the error?
On Fri, Jun 18, 2021 at 4:45 AM Olivia Mendivil Ramos < @.***> wrote:
Ok, more into this warning that leaves me slightly confused as when I ran the annotation (--anno 1 --step anno )
Apart from the warning above I get: Use of uninitialized value $type in concatenation (.) or string at /TREES_2020/EDTA_p/EDTA/util/gff2bed.pl line 84, line Which stops from obtaining a *.TEanno.sum file Same as #171 https://github.com/oushujun/EDTA/issues/171
Any suggestion to what to do?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/oushujun/EDTA/issues/178#issuecomment-863551818, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNX4NBBW3E3YMZZYPJW723TTJNFBANCNFSM42PCAWIQ .
@oliviamr can you provide reproducible sample data for me to test with? Thanks! - Shujun
Hi Shujun,
Thank you for following up. Let me put you in context:
Since I am doing a 10Gb> plant, I followed your recipe noted in here: https://github.com/oushujun/EDTA/issues/61#issuecomment-721003071
The final step runs so that I get the folder *.fasta.mod.EDTA.anno, the files *.fasta.mod.MAKER.masked and *.fasta.mod.EDTA.TEanno.gff3 but the file *.mod.EDTA.TEanno.sum is empty throwing the error after using this --anno 1 --step anno --> /TREES_2020/EDTA_p/EDTA/util/gff2bed.pl line 84
what is the best way to send you sample data? and what is a suitable sample size?
Also question: is it important that CDS are high quality (no frameshifts included) for the annotation or somewhere else in the pipeline?
@oliviamr yes, otherwise you may include too many TEs in the CDS and thus remove too many TEs with these CDS sequences.