EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

CrossmatchSearchEngine::parseOutput issue

Open cai1991 opened this issue 1 year ago • 5 comments

Hi Shujun,

Thanks a lot for developing this great tool. I installed EDTA using: mamba env create -f EDTA_2.2.x.yml

EDTA works well on test data. But for my genome (plants, genome size ~600 Mb), I encountered wanings/errors, such as "CrossmatchSearchEngine::parseOutput: Unable to open results file: " and "SINE/NA not found in the TE_SO database". Please see below the detailed information. I obtained all the output files. Did these warnings influence the results and could you please help me to figure it out? Thanks a lot in advance.

Best regards, Chengcheng

my command:

#!/bin/bash

genome=bro.LA105.7gaps.chr.newID.fa
cds=T24.chr.cds.fasta
threads=48

/data3/caicc/Softwares/50/miniconda3/envs/EDTA2/bin/perl /data3/caicc/Softwares/50/EDTA/EDTA-master/EDTA.pl --genome $genome --cds $cds --anno 1 --threads $threads --overwrite 1 --sensitive 1

The log file:


#########################################################
##### Extensive de-novo TE Annotator (EDTA) v2.2.1  #####
##### Shujun Ou ([email protected])             #####
#########################################################


Parameters: --genome bro.LA105.7gaps.chr.newID.fa --cds T24.chr.cds.fasta --anno 1 --threads 48 --overwrite 1 --sensitive 1 --debug 1


Tue Jun 25 21:49:51 CST 2024	Dependency checking:
				All passed!

	A CDS file T24.chr.cds.fasta is provided via --cds. Please make sure this is the DNA sequence of coding regions only.

Tue Jun 25 21:49:59 CST 2024	Obtain raw TE libraries using various structure-based programs: 
Tue Jun 25 21:49:59 CST 2024	EDTA_raw: Check dependencies, prepare working directories.

Tue Jun 25 21:50:02 CST 2024	Start to find LTR candidates.

Tue Jun 25 21:50:02 CST 2024	Identify LTR retrotransposon candidates from scratch.

Tue Jun 25 23:10:51 CST 2024	Finish finding LTR candidates.

Tue Jun 25 23:10:51 CST 2024	Start to find SINE candidates.

Wed Jun 26 00:53:30 CST 2024	Finish finding SINE candidates.

Wed Jun 26 00:53:30 CST 2024	Start to find LINE candidates.

Wed Jun 26 00:53:30 CST 2024	Identify LINE retrotransposon candidates from scratch.

Wed Jun 26 22:22:12 CST 2024	Finish finding LINE candidates.

Wed Jun 26 22:22:12 CST 2024	Start to find TIR candidates.

Wed Jun 26 22:22:12 CST 2024	Identify TIR candidates from scratch.

Species: others
Thu Jun 27 00:52:18 CST 2024	Finish finding TIR candidates.

Thu Jun 27 00:52:18 CST 2024	Start to find Helitron candidates.

Thu Jun 27 00:52:18 CST 2024	Identify Helitron candidates from scratch.

Thu Jun 27 04:49:40 CST 2024	Finish finding Helitron candidates.

Thu Jun 27 04:49:40 CST 2024	Execution of EDTA_raw.pl is finished!

Thu Jun 27 04:49:40 CST 2024	Obtain raw TE libraries finished.
				All intact TEs found by EDTA: 
					bro.LA105.7gaps.chr.newID.fa.mod.EDTA.intact.raw.fa 
					bro.LA105.7gaps.chr.newID.fa.mod.EDTA.intact.raw.gff3

Thu Jun 27 04:49:40 CST 2024	Perform EDTA advance filtering for raw TE candidates and generate the stage 1 library: 

CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_3972217.ThuJun270451122024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.raw.fa.HQ_batch-131.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4026357.ThuJun270453562024/bro.LA105.7gaps.chr.newID.fa.mod.TIR.intact.raw.fa_batch-13.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-2.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-8.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-11.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-39.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-93.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-114.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-182.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-223.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-239.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
CrossmatchSearchEngine::parseOutput: Unable to open results file: /data3/caicc/BR/03BolGapFree/01hifiasm/LA105Cell1/05repeats/02EDTA/bro.LA105.7gaps.chr.newID.fa.mod.EDTA.combine/RM_4156448.ThuJun270502322024/bro.LA105.7gaps.chr.newID.fa.mod.LTR.intact.raw.fa_batch-334.cat : No such file or directory at /data3/caicc/Softwares/50/miniconda3/envs/EDTA2/share/RepeatMasker/CrossmatchSearchEngine.pm line 552.
Thu Jun 27 05:12:08 CST 2024	EDTA advance filtering finished.

Thu Jun 27 05:12:08 CST 2024	Perform EDTA final steps to generate a non-redundant comprehensive TE library.

				Filter RepeatModeler results that are ignored in the raw step.

Thu Jun 27 05:12:48 CST 2024	Clean up TE-related sequences in the CDS file with TEsorter.

				Remove CDS-related sequences in the EDTA library.

				Remove CDS-related sequences in intact TEs.

SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
tRNA/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
SINE/NA not found in the TE_SO database, it will not be used to rename sequences in the final annotation.
Thu Jun 27 05:31:35 CST 2024	EDTA final stage finished! You may check out:
				The final EDTA TE library: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TElib.fa
Thu Jun 27 05:31:35 CST 2024	Perform post-EDTA analysis for whole-genome annotation:

Thu Jun 27 05:31:35 CST 2024	Homology-based annotation of TEs using bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TElib.fa from scratch.

Thu Jun 27 06:29:33 CST 2024	TE annotation using the EDTA library has finished! Check out:
				Whole-genome TE annotation (total TE: 57.21%): bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TEanno.gff3
				Whole-genome TE annotation summary: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TEanno.sum
				Whole-genome TE divergence plot: bro.LA105.7gaps.chr.newID.fa.mod_divergence_plot.pdf
				Whole-genome TE density plot: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TEanno.density_plots.pdf
				Low-threshold TE masking for MAKER gene annotation (masked: 27.87%): bro.LA105.7gaps.chr.newID.fa.mod.MAKER.masked

Thu Jun 27 06:29:34 CST 2024	Evaluate the level of inconsistency for whole-genome TE annotation:

Thu Jun 27 06:34:02 CST 2024	Evaluation of TE annotation finished! Check out these files:

				Overall: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TE.fa.stat.all.sum
				Nested: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TE.fa.stat.nested.sum
				Non-nested: bro.LA105.7gaps.chr.newID.fa.mod.EDTA.TE.fa.stat.redun.sum

				If you want to learn more about the formatting and information of these files, please visit:
					https://github.com/oushujun/EDTA/wiki/Making-sense-of-EDTA-usage-and-outputs---Q&A

cai1991 avatar Jul 09 '24 08:07 cai1991

Dear Chengcheng,

Sorry for the long delay. EDTA configured RepeatMasker to use the rmblast engine. I haven't use the CrossmatchSearchEngine before. Are you aware of any special configurations?

Thanks! Shujun

oushujun avatar Oct 08 '24 02:10 oushujun

Dear Shujun,

Sorry for the late response. I did not yet figure it out and was too occupied by other stuff.

Another thing I would like to mention is that different runs of the same genome with the same parameters seem to result in very different outputs, especielly for the Copia and Gypsy LTRs. Please see it in the attached figure. I run on my genome for five times and each time I obtained different results. The Copia and Gypsy ratio seem to vary a lot between some runs. I don't know whether this is caused by the above issues. My EDTA version is v2.2.1.

Best regards, Chengcheng

inconsistent results for different runs

cai1991 avatar Oct 18 '24 05:10 cai1991

Do you have the same issue when running these five times? I also noticed the LTR performance is inferior to the previous versions in maize but unsure how prevalent this is.

Shujun

oushujun avatar Oct 18 '24 14:10 oushujun

Yes, each time the same issue happens.

Best, Chengcheng

cai1991 avatar Oct 18 '24 15:10 cai1991

Please check with your default $ENV, make sure there's no other version of Repeatmasker masking the conda version. The conda version should use rmblastn as the search engine.

Shujun

oushujun avatar Oct 18 '24 16:10 oushujun

any luck?

oushujun avatar Dec 17 '24 17:12 oushujun

Hi Shujun,

I still have the issue (I can ensure that there is no other version of Repeatmasker masking the conda version). I also tried the latest version (v2.2.2) you released 3 weeks agso. Still not solve the problem. More importantly, different runs of the same genome with the same parameters still result in very different outputs for the Copia and Gypsy LTRs...

Any suggestion will be greatly appreciated.

Best regards, Chengcheng

cai1991 avatar Dec 19 '24 08:12 cai1991

I suggest trying a different HPC platform, even with your laptop on a small genome. EDTA is not using the CrossmatchSearchEngine, which indicates your current HPC is not acting as expected.

oushujun avatar Dec 19 '24 15:12 oushujun

Any luck?

oushujun avatar Feb 13 '25 21:02 oushujun

Hi @oushujun ,

Sorry for the long delay. This is the issue caused by RepeatMasker (https://github.com/Dfam-consortium/RepeatMasker/issues/271).

Best regards,

cai1991 avatar Feb 14 '25 02:02 cai1991