EDTA
EDTA copied to clipboard
LINE and SINE results files has 0 bp!
Dr. Shujun,
Hi! I installed EDTA v2.2.1 by ran the commands "git clone https://github.com/oushujun/EDTA.git" and "mamba env create -f EDTA_2.2.x.yml".
And I tested it with the following command “perl... /EDTA.pl --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0. Liban --exclude genome.exclude.bed -- overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 10 ”. But the following warning was in the output log: "Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty.", "Warning: The SINE result file has 0 bp!", " Warning:The LINE result file has 0 bp!", "Error encountered: [Errno 2] No such file or directory: 'bedtools' mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory", "cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory".
I don't know whether there is a dependency failed to be installed successfully or the data itself does not have new LINEs/SINEs. The following is my log file, may I ask if this is the successful installation?
#########################################################
Extensive de-novo TE Annotator (EDTA) v2.2.1
Shujun Ou ([email protected])
#########################################################
Parameters: --genome genome.fa --cds genome.cds.fa --curatedlib rice7.0.0.liban --exclude genome.exclude.bed --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 10
2024年 04月 17日 星期三 22:57:04 CST Dependency checking: All passed!
A custom library rice7.0.0.liban is provided via --curatedlib. Please make sure this is a manually curated library but not machine generated.
A CDS file genome.cds.fa is provided via --cds. Please make sure this is the DNA sequence of coding regions only.
A BED file is provided via --exclude. Regions specified by this file will be excluded from TE annotation and masking.
2024年 04月 17日 星期三 22:57:08 CST Obtain raw TE libraries using various structure-based programs: 2024年 04月 17日 星期三 22:57:08 CST EDTA_raw: Check dependencies, prepare working directories.
2024年 04月 17日 星期三 22:57:09 CST Start to find LTR candidates.
2024年 04月 17日 星期三 22:57:09 CST Identify LTR retrotransposon candidates from scratch.
Warning: LOC list genome.fa.mod.ltrTE.veryfalse is empty. 2024年 04月 17日 星期三 22:57:33 CST Finish finding LTR candidates.
2024年 04月 17日 星期三 22:57:33 CST Start to find SINE candidates.
2024年 04月 17日 星期三 22:58:14 CST Warning: The SINE result file has 0 bp!
2024年 04月 17日 星期三 22:58:14 CST Start to find LINE candidates.
2024年 04月 17日 星期三 22:58:14 CST Identify LINE retrotransposon candidates from scratch.
2024年 04月 17日 星期三 22:59:56 CST Warning: The LINE result file has 0 bp!
2024年 04月 17日 星期三 22:59:56 CST Start to find TIR candidates.
2024年 04月 17日 星期三 22:59:56 CST Identify TIR candidates from scratch.
Species: others 2024年 04月 17日 星期三 23:00:47 CST Finish finding TIR candidates.
2024年 04月 17日 星期三 23:00:47 CST Start to find Helitron candidates.
2024年 04月 17日 星期三 23:00:47 CST Identify Helitron candidates from scratch.
2024年 04月 17日 星期三 23:01:22 CST Finish finding Helitron candidates.
2024年 04月 17日 星期三 23:01:22 CST Execution of EDTA_raw.pl is finished!
2024年 04月 17日 星期三 23:01:22 CST Obtain raw TE libraries finished. All intact TEs found by EDTA: genome.fa.mod.EDTA.intact.raw.fa genome.fa.mod.EDTA.intact.raw.gff3
2024年 04月 17日 星期三 23:01:22 CST Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:
Warning: No sequences were masked 2024年 04月 17日 星期三 23:01:40 CST EDTA advance filtering finished.
2024年 04月 17日 星期三 23:01:40 CST Perform EDTA final steps to generate a non-redundant comprehensive TE library.
Filter RepeatModeler results that are ignored in the raw step.
2024年 04月 17日 星期三 23:01:45 CST Clean up TE-related sequences in the CDS file with TEsorter.
Remove CDS-related sequences in the EDTA library.
Remove CDS-related sequences in intact TEs.
2024年 04月 17日 星期三 23:01:52 CST Combine the high-quality TE library rice7.0.0.liban with the EDTA library:
2024年 04月 17日 星期三 23:01:59 CST EDTA final stage finished! You may check out: The final EDTA TE library: genome.fa.mod.EDTA.TElib.fa Family names of intact TEs have been updated by rice7.0.0.liban: genome.fa.mod.EDTA.intact.gff3 Comparing to the provided library, EDTA found these novel TEs: genome.fa.mod.EDTA.TElib.novel.fa The provided library has been incorporated into the final library: genome.fa.mod.EDTA.TElib.fa
2024年 04月 17日 星期三 23:01:59 CST Perform post-EDTA analysis for whole-genome annotation:
2024年 04月 17日 星期三 23:01:59 CST Homology-based annotation of TEs using genome.fa.mod.EDTA.TElib.fa from scratch.
Error encountered: [Errno 2] No such file or directory: 'bedtools' mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory 2024年 04月 17日 星期三 23:02:10 CST TE annotation using the EDTA library has finished! Check out: Whole-genome TE annotation (total TE: 34.61%): genome.fa.mod.EDTA.TEanno.gff3 Whole-genome TE annotation summary: genome.fa.mod.EDTA.TEanno.sum Whole-genome TE divergence plot: genome.fa.mod_divergence_plot.pdf Whole-genome TE density plot: genome.fa.mod.EDTA.TEanno.density_plots.pdf Low-threshold TE masking for MAKER gene annotation (masked: 17.27%): genome.fa.mod.MAKER.masked
cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory 2024年 04月 17日 星期三 23:02:10 CST Evaluate the level of inconsistency for whole-genome TE annotation:
2024年 04月 17日 星期三 23:02:12 CST Evaluation of TE annotation finished! Check out these files:
Overall: genome.fa.mod.EDTA.TE.fa.stat.all.sum
Nested: genome.fa.mod.EDTA.TE.fa.stat.nested.sum
Non-nested: genome.fa.mod.EDTA.TE.fa.stat.redun.sum
If you want to learn more about the formatting and information of these files, please visit:
https://github.com/oushujun/EDTA/wiki/Making-sense-of-EDTA-usage-and-outputs---Q&A
The file "genome.fa.mod.EDTA.TEanno.sum" is as follow, did I run it successfully?
$ cat genome.fa.mod.EDTA.TEanno.sum Repeat Classes
Total Sequences: 1
Total Length: 1000000 bp
Class Count bpMasked %masked
===== ===== ======== =======
LINE -- -- --
unknown 39 13979 1.40%
LTR -- -- --
Copia 11 18647 1.86%
Gypsy 48 108654 10.87%
TRIM 1 129 0.01%
unknown 1 248 0.02%
SINE -- -- --
unknown 11 1775 0.18%
TIR -- -- --
CACTA 23 22722 2.27%
Mutator 115 47072 4.71%
PIF_Harbinger 110 28045 2.80%
PILE 4 1033 0.10%
POLE 2 506 0.05%
Tc1_Mariner 124 48718 4.87%
hAT 35 13953 1.40%
unknown 9 1433 0.14%
nonTIR -- -- --
helitron 56 39164 3.92%
---------------------------------
total interspersed 589 346078 34.61%
Total 589 346078 34.61%
Here, how to solve the problem: mv: cannot stat 'chromosome_density_plots.pdf': No such file or directory Wed Jul 24 10:53:51 CST 2024 TE annotation using the EDTA library has finished! Check out: Whole-genome TE annotation (total TE: 34.21%): genome.fa.mod.EDTA.TEanno.gff3 Whole-genome TE annotation summary: genome.fa.mod.EDTA.TEanno.sum Whole-genome TE divergence plot: genome.fa.mod_divergence_plot.pdf Whole-genome TE density plot: genome.fa.mod.EDTA.TEanno.density_plots.pdf Low-threshold TE masking for MAKER gene annotation (masked: 16.15%): genome.fa.mod.MAKER.masked
Use of uninitialized value $mod_time in localtime at ../EDTA.pl line 847. cp: cannot stat 'genome.fa.mod.EDTA.TEanno.density_plots.pdf': No such file or directory
Hi, it is ok to not have sine/line in the test data output. As for the dependency 'bedtools', we updated the yml file, you may try reinstalling EDTA!