CrossmatchSearchEngine::parseOutput issue
Hi Shujun,
Thanks a lot for developing this great tool. I installed EDTA using: mamba env create -f EDTA_2.2.x.yml
EDTA works well on test data. But for my genome (plants, genome size ~600 Mb), I encountered wanings/errors, such as "CrossmatchSearchEngine::parseOutput: Unable to open results file: " and "SINE/NA not found in the TE_SO database". Please see below the detailed information. I obtained all the output files. Did these warnings influence the results and could you please help me to figure it out? Thanks a lot in advance.
Best regards, Chengcheng
my command:
#!/bin/bash
genome=bro.LA105.7gaps.chr.newID.fa
cds=T24.chr.cds.fasta
threads=48
/data3/caicc/Softwares/50/miniconda3/envs/EDTA2/bin/perl /data3/caicc/Softwares/50/EDTA/EDTA-master/EDTA.pl --genome $genome --cds $cds --anno 1 --threads $threads --overwrite 1 --sensitive 1
The log file:
#########################################################
##### Extensive de-novo TE Annotator (EDTA) v2.2.1 #####
##### Shujun Ou ([email protected]) #####
#########################################################
Parameters: --genome bro.LA105.7gaps.chr.newID.fa --cds T24.chr.cds.fasta --anno 1 --threads 48 --overwrite 1 --sensitive 1 --debug 1
Tue Jun 25 21:49:51 CST 2024 Dependency checking:
All passed!
A CDS file T24.chr.cds.fasta is provided via --cds. Please make sure this is the DNA sequence of coding regions only.
Tue Jun 25 21:49:59 CST 2024 Obtain raw TE libraries using various structure-based programs:
Tue Jun 25 21:49:59 CST 2024 EDTA_raw: Check dependencies, prepare working directories.
Tue Jun 25 21:50:02 CST 2024 Start to find LTR candidates.
Tue Jun 25 21:50:02 CST 2024 Identify LTR retrotransposon candidates from scratch.
Tue Jun 25 23:10:51 CST 2024 Finish finding LTR candidates.
Tue Jun 25 23:10:51 CST 2024 Start to find SINE candidates.
Wed Jun 26 00:53:30 CST 2024 Finish finding SINE candidates.
Wed Jun 26 00:53:30 CST 2024 Start to find LINE candidates.
Wed Jun 26 00:53:30 CST 2024 Identify LINE retrotransposon candidates from scratch.
Wed Jun 26 22:22:12 CST 2024 Finish finding LINE candidates.
Wed Jun 26 22:22:12 CST 2024 Start to find TIR candidates.
Wed Jun 26 22:22:12 CST 2024 Identify TIR candidates from scratch.
Species: others
Thu Jun 27 00:52:18 CST 2024 Finish finding TIR candidates.
Thu Jun 27 00:52:18 CST 2024 Start to find Helitron candidates.
Thu Jun 27 00:52:18 CST 2024 Identify Helitron candidates from scratch.
Thu Jun 27 04:49:40 CST 2024 Finish finding Helitron candidates.
Thu Jun 27 04:49:40 CST 2024 Execution of EDTA_raw.pl is finished!
Thu Jun 27 04:49:40 CST 2024 Obtain raw TE libraries finished.
All intact TEs found by EDTA:
bro.LA105.7gaps.chr.newID.fa.mod.EDTA.intact.raw.fa
bro.LA105.7gaps.chr.newID.fa.mod.EDTA.intact.raw.gff3
Thu Jun 27 04:49:40 CST 2024 Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library:
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_3972217.ThuJun270451122024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.raw.fa.HQ_batch-131.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4026357.ThuJun270453562024/bro.LA105.7gaps.chr.newID.fa.mod.TIR.intact.raw.fa_batch-13.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-2.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-8.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-11.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-39.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-93.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-114.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-182.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-223.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-239.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-334.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
Thu Jun 27 05:12:08 CST 2024 EDTA advance filtering finished.
Thu Jun 27 05:12:08 CST 2024 Perform EDTA final steps to generate a non-redundant comprehensive TE library.
Filter RepeatModeler results that are ignored in the raw step.
Thu Jun 27 05:12:48 CST 2024 Clean up TE-related sequences in the CDS file with TEsorter.
Remove CDS-related sequences in the EDTA library.
Remove CDS-related sequences in intact TEs.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
tRNA/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
Thu Jun 27 05:31:35 CST 2024 EDTA final stage finished! You may check out:
The final EDTA TE library: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TElib.fa
Thu Jun 27 05:31:35 CST 2024 Perform post-EDTA analysis for whole-genome annotation:
Thu Jun 27 05:31:35 CST 2024 Homology-based annotation of TEs using bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TElib.fa from scratch.
Thu Jun 27 06:29:33 CST 2024 TE annotation using the EDTA library has finished! Check out:
Whole-genome TE annotation (total TE: 57.21%): bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TEanno.gff3
Whole-genome TE annotation summary: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TEanno.sum
Whole-genome TE divergence plot: bro.LA105.7gaps.chr.newID.fa.mod_divergence_plot.pdf
Whole-genome TE density plot: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TEanno.density_plots.pdf
Low-threshold TE masking for MAKER gene annotation (masked: 27.87%): bro.LA105.7gaps.chr.newID.fa.mod.MAKER.masked
Thu Jun 27 06:29:34 CST 2024 Evaluate the level of inconsistency for whole-genome TE annotation:
Thu Jun 27 06:34:02 CST 2024 Evaluation of TE annotation finished! Check out these files:
Overall: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TE.fa.stat.all.sum
Nested: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TE.fa.stat.nested.sum
Non-nested: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TE.fa.stat.redun.sum
If you want to learn more about the formatting and information of these files, please visit:
https://github.com/oushujun/EDTA/wiki/Making-sense-of-EDTA-usage-and-outputs---Q&A
Dear Chengcheng,
Sorry for the long delay. EDTA configured RepeatMasker to use the rmblast engine. I haven't use the CrossmatchSearchEngine before. Are you aware of any special configurations?
Thanks! Shujun
Dear Shujun,
Sorry for the late response. I did not yet figure it out and was too occupied by other stuff.
Another thing I would like to mention is that different runs of the same genome with the same parameters seem to result in very different outputs, especielly for the Copia and Gypsy LTRs. Please see it in the attached figure. I run on my genome for five times and each time I obtained different results. The Copia and Gypsy ratio seem to vary a lot between some runs. I don't know whether this is caused by the above issues. My EDTA version is v2.2.1.
Best regards, Chengcheng
Do you have the same issue when running these five times? I also noticed the LTR performance is inferior to the previous versions in maize but unsure how prevalent this is.
Shujun
Yes, each time the same issue happens.
Best, Chengcheng
Please check with your default $ENV, make sure there's no other version of Repeatmasker masking the conda version. The conda version should use rmblastn as the search engine.
Shujun
any luck?
Hi Shujun,
I still have the issue (I can ensure that there is no other version of Repeatmasker masking the conda version). I also tried the latest version (v2.2.2) you released 3 weeks agso. Still not solve the problem. More importantly, different runs of the same genome with the same parameters still result in very different outputs for the Copia and Gypsy LTRs...
Any suggestion will be greatly appreciated.
Best regards, Chengcheng
I suggest trying a different HPC platform, even with your laptop on a small genome. EDTA is not using the CrossmatchSearchEngine, which indicates your current HPC is not acting as expected.
Any luck?
Hi @oushujun ,
Sorry for the long delay. This is the issue caused by RepeatMasker (https://github.com/Dfam-consortium/RepeatMasker/issues/271).
Best regards,