EDTA icon indicating copy to clipboard operation
EDTA copied to clipboard

ERROR: Raw TIR results not found in Fhispida.fasta.mod.EDTA.raw/Fhispida.fasta.mod.TIR.raw.fa

Open xuzhougeng opened this issue 1 year ago • 5 comments

Hi, shujun

I recently run the EDTA pipeline(v2.0.1) on the Ficus hispida using the following command

As the EDTA was running successfully on my arabidopsis species, the pipeline was installed correctly.

# download the reference
wget https://download.cncb.ac.cn/gwh/Plants/Ficus_hispida_fh_G_GWHALOG00000000/GWHALOG00000000.genome.fasta.gz
gunzip  GWHALOG00000000.genome.fasta.gz 
mv  GWHALOG00000000.genome.fasta Fhispida.fasta

# run EDTA
perl ~/software/EDTA/EDTA.pl --genome Fhispida.fasta  --anno 1 --overwrite 1 --sensitive 0 --evaluate 0 --threads 50

But EDTA stop in EDTA_raw.pl step, and only output the Fhispida.fasta.mod.EDTA.raw directory.

I found a lot error message in TIR finding step after I read the log.

Normal message:

Tue Jul 19 20:43:13 CST 2022	Dependency checking:
				All passed!

Tue Jul 19 20:43:17 CST 2022	The longest sequence ID in the genome contains 81 characters, which is longer than the limit (13)
				Trying to reformat seq IDs...
				Attempt 1...
				Attempt 2...
Tue Jul 19 20:43:19 CST 2022	Seq ID conversion successful!

Tue Jul 19 20:43:21 CST 2022	Obtain raw TE libraries using various structure-based programs:
Tue Jul 19 20:43:21 CST 2022	EDTA_raw: Check dependencies, prepare working directories.
Tue Jul 19 20:43:23 CST 2022	Start to find LTR candidates.
Tue Jul 19 20:43:23 CST 2022	Identify LTR retrotransposon candidates from scratch.
Tue Jul 19 21:03:05 CST 2022	Finish finding LTR candidates.
Tue Jul 19 21:03:05 CST 2022	Start to find TIR candidates.
Tue Jul 19 21:03:05 CST 2022	Identify TIR candidates from scratch.

Error message:

Species: others
Traceback (most recent call last):
  File "/home/xzg/software/EDTA/bin/TIR-Learner2.5/Module2/RunGRF.py", line 79, in <module>
    if (len(str(records[0].seq))>int(length)+500):
IndexError: list index out of range
cp: cannot stat 'TIR-Learner/*-p': No such file or directory
cat: '*-+-DTA.fa': No such file or directory
cat: '*-+-DTC.fa': No such file or directory
cat: '*-+-DTH.fa': No such file or directory
cat: '*-+-DTM.fa': No such file or directory
cat: '*-+-DTT.fa': No such file or directory
cat: '*-+-NonTIR.fa': No such file or directory
cat: '*-+-*-+-*.gff3': No such file or directory
rm: cannot remove '*-+-*-+-*.gff3': No such file or directory
...
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'

mv: cannot stat 'TIR-Learner/*FinalAnn*.gff3': No such file or directory
mv: cannot stat 'TIR-Learner/*FinalAnn*.fa': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at ...

Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /home/xzg/software/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list Fhispida.fasta.mod.TIR.ext30.list is empty.

Error: Error while loading sequenceCan't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!

normal message :

Tue Jul 19 21:03:35 CST 2022	Start to find Helitron candidates.
Tue Jul 19 21:03:35 CST 2022	Identify Helitron candidates from scratch.
Wed Jul 20 00:05:23 CST 2022	Finish finding Helitron candidates.
Wed Jul 20 00:05:23 CST 2022	Execution of EDTA_raw.pl is finished

ERROR: Raw TIR results not found in Fhispida.fasta.mod.EDTA.raw/Fhispida.fasta.mod.TIR.raw.fa
	If you believe the program is working properly, this may be caused by the lack of intact TIRs in your genome. Consider to use the --force 1 parameter to overwrite this check

xuzhougeng avatar Jul 20 '22 00:07 xuzhougeng

Hello @xuzhougeng,

Thanks for the report. I will check on this genome and let you know.

Best, Shujun

oushujun avatar Jul 21 '22 15:07 oushujun

Hi Shujun, did you find a solution to this? I got the same error in some already published moth (lepidopterans) assemblies.

niconm89 avatar Aug 19 '22 18:08 niconm89

Hi @niconm89 and @xuzhougeng,

Sorry for the long delay. No I have not found the cause of the error yet. I can reproduce the error with the above codes but I need to further get into the steps to find out why. My apologies.

Best, Shujun

oushujun avatar Aug 24 '22 16:08 oushujun

Hi Mr. @oushujun I'm facing the same error with TIR-Learner, in my case for Drosophila species. I tried to re-install the conda environment again, using the updated .yml file provided in the git repository, but I'm still getting the same message. When I use --force 1 it works properly, but then I'm not feeling very comfortable by using a rice library with Drosophila species.

Tue Sep  6 16:20:45 -03 2022	Dependency checking:
				All passed!

	A CDS file dkoep-CDSs.fa is provided via --cds. Please make sure this is the DNA sequence of coding regions only.

Tue Sep  6 16:20:51 -03 2022	Obtain raw TE libraries using various structure-based programs: 
Tue Sep  6 16:20:51 -03 2022	EDTA_raw: Check dependencies, prepare working directories.

Tue Sep  6 16:20:53 -03 2022	Start to find LTR candidates.

Tue Sep  6 16:20:53 -03 2022	Identify LTR retrotransposon candidates from scratch.

Tue Sep  6 16:39:19 -03 2022	Finish finding LTR candidates.

Tue Sep  6 16:39:19 -03 2022	Start to find TIR candidates.

Tue Sep  6 16:39:19 -03 2022	Identify TIR candidates from scratch.

Species: others
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/oliveirads/softwares/EDTA/bin/TIR-Learner2.5/Module3_New/getDataset.py", line 110, in Predict
    model = load_model(path+'/CNN0912.h5')
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/keras/saving/save.py", line 146, in load_model
    return hdf5_format.load_model_from_hdf5(filepath, custom_objects, compile)
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 210, in load_model_from_hdf5
    model_config = json.loads(model_config.decode('utf-8'))
AttributeError: 'str' object has no attribute 'decode'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/oliveirads/softwares/EDTA/bin/TIR-Learner2.5/Module3_New/getDataset.py", line 139, in <module>
    d = pool.map(Predict,files)
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
AttributeError: 'str' object has no attribute 'decode'
cat: '*-+-DTA.fa': Arquivo ou diretório inexistente
cat: '*-+-DTC.fa': Arquivo ou diretório inexistente
cat: '*-+-DTH.fa': Arquivo ou diretório inexistente
cat: '*-+-DTM.fa': Arquivo ou diretório inexistente
cat: '*-+-DTT.fa': Arquivo ou diretório inexistente
cat: '*-+-NonTIR.fa': Arquivo ou diretório inexistente
cat: '*-+-*-+-*.gff3': Arquivo ou diretório inexistente
rm: não foi possível remover '*-+-*-+-*.gff3': Arquivo ou diretório inexistente
Traceback (most recent call last):
  File "/home/oliveirads/softwares/EDTA/bin/TIR-Learner2.5/Module3_New/CombineAll.py", line 75, in <module>
    f_m3=removeDupinSingle("%s.gff3"%(genome_Name+spliter+"Module3"))
  File "/home/oliveirads/softwares/EDTA/bin/TIR-Learner2.5/Module3_New/CombineAll.py", line 57, in removeDupinSingle
    f=pd.read_csv(file,header=None,sep="\t") #shujun
  File "/home/oliveirads/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/oliveirads/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/home/oliveirads/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "/home/oliveirads/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/oliveirads/.local/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/oliveirads/softwares/EDTA/bin/TIR-Learner2.5/Module3/GetAllSeq.py", line 32, in GetListFromFile
    f=open(file,"r+")
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/oliveirads/softwares/EDTA/bin/TIR-Learner2.5/Module3/GetAllSeq.py", line 63, in <module>
    pool.map(GetListFromFile,fileList) #shujun
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/oliveirads/miniconda3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
mv: impossível obter estado de 'TIR-Learner/*FinalAnn*.gff3': Arquivo ou diretório inexistente
mv: impossível obter estado de 'TIR-Learner/*FinalAnn*.fa': Arquivo ou diretório inexistente
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: Arquivo ou diretório inexistente at /home/oliveirads/softwares/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list Dkoepferae.FlyeRacon3Medaka.fasta.mod.TIR.ext30.list is empty.

Error: Error while loading sequenceCan't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: Arquivo ou diretório inexistente.
Warning: The TIR result file has 0 bp!

Tue Sep  6 17:20:58 -03 2022	Start to find Helitron candidates.

Any advice? Thank you!

OliveiraDS-hub avatar Sep 06 '22 20:09 OliveiraDS-hub

Hi @OliveiraDS-hub,

Yours is a different error. If you have not tried so, you may want to reinstall EDTA in a clean env (ie, creating a new one). A quick search of the error message suggests you may have a conflicting h5py installed: https://stackoverflow.com/questions/53740577/does-any-one-got-attributeerror-str-object-has-no-attribute-decode-whi

Best, Shujun

oushujun avatar Sep 07 '22 22:09 oushujun

Hi @oushujun,

I am also having this error on our de novo primate genome as well as a genome on NCBI (ASM2149847v1). There is more than enough memory on the system, and I tried the solution export PYTHONNOUSERSITE=True from #14.

I'm running the most recent version of EDTA from commit ffc55905638a88ed58d22246812f7ebec6fffc94 on Linux 4.15.0-154-generic in a new mamba environment with only EDTA installed. Here is the mamba env being used: env.yml.removetxt.txt

Command: perl EDTA/EDTA.pl --genome data/midas.fa --overwrite 1 --sensitive 1 --anno 1 --evaluate 1 --threads 64

Log of Error

Thu Jan 12 21:31:14 EST 2023    Identify TIR candidates from scratch.

Species: others
Traceback (most recent call last):
  File "/home/OSUMC.EDU/kana18/workspace/blachly_lab/2022-05-tamarin-assembly/repeatmasking/EDTA/bin/TIR-Learner2.5/Module2/RunGRF.py", line 79, in <module>
    if (len(str(records[0].seq))>int(length)+500):
IndexError: list index out of range
cp: cannot stat 'TIR-Learner/*-p': No such file or directory
cat: '*-+-DTA.fa': No such file or directory
cat: '*-+-DTC.fa': No such file or directory
cat: '*-+-DTH.fa': No such file or directory
cat: '*-+-DTM.fa': No such file or directory
cat: '*-+-DTT.fa': No such file or directory
cat: '*-+-NonTIR.fa': No such file or directory
cat: '*-+-*-+-*.gff3': No such file or directory
rm: cannot remove '*-+-*-+-*.gff3': No such file or directory
Traceback (most recent call last):
  File "/home/OSUMC.EDU/kana18/workspace/blachly_lab/2022-05-tamarin-assembly/repeatmasking/EDTA/bin/TIR-Learner2.5/Module3_New/CombineAll.py", line 75, in <module>
    f_m3=removeDupinSingle("%s.gff3"%(genome_Name+spliter+"Module3"))
  File "/home/OSUMC.EDU/kana18/workspace/blachly_lab/2022-05-tamarin-assembly/repeatmasking/EDTA/bin/TIR-Learner2.5/Module3_New/CombineAll.py", line 57, in removeDupinSingle
    f=pd.read_csv(file,header=None,sep="\t") #shujun
  File "/home/OSUMC.EDU/kana18/mambaforge-pypy3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/OSUMC.EDU/kana18/mambaforge-pypy3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 454, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/home/OSUMC.EDU/kana18/mambaforge-pypy3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 948, in __init__
    self._make_engine(self.engine)
  File "/home/OSUMC.EDU/kana18/mambaforge-pypy3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 1180, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/home/OSUMC.EDU/kana18/mambaforge-pypy3/envs/EDTA/lib/python3.6/site-packages/pandas/io/parsers.py", line 2010, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 540, in pandas._libs.parsers.TextReader.__cinit__
pandas.errors.EmptyDataError: No columns to parse from file
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/OSUMC.EDU/kana18/mambaforge-pypy3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/OSUMC.EDU/kana18/mambaforge-pypy3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 44, in mapstar
    return list(map(*args))
  File "/home/OSUMC.EDU/kana18/workspace/blachly_lab/2022-05-tamarin-assembly/repeatmasking/EDTA/bin/TIR-Learner2.5/Module3/GetAllSeq.py", line 32, in GetListFromFile
    f=open(file,"r+")
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/OSUMC.EDU/kana18/workspace/blachly_lab/2022-05-tamarin-assembly/repeatmasking/EDTA/bin/TIR-Learner2.5/Module3/GetAllSeq.py", line 63, in <module>
    pool.map(GetListFromFile,fileList) #shujun
  File "/home/OSUMC.EDU/kana18/mambaforge-pypy3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/OSUMC.EDU/kana18/mambaforge-pypy3/envs/EDTA/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
FileNotFoundError: [Errno 2] No such file or directory: 'TIR-Learner_FinalAnn.gff3'
mv: cannot stat 'TIR-Learner/*FinalAnn*.gff3': No such file or directory
mv: cannot stat 'TIR-Learner/*FinalAnn*.fa': No such file or directory
Can't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.fa: No such file or directory at /home/OSUMC.EDU/kana18/workspace/blachly_lab/2022-05-tamarin-assembly/repeatmasking/EDTA/util/rename_tirlearner.pl line 19.
Warning: LOC list midas.fa.mod.TIR.ext30.list is empty.

Error: Error while loading sequenceCan't open ./TIR-Learner-Result/TIR-Learner_FinalAnn.gff3: No such file or directory.
Warning: The TIR result file has 0 bp!

Kekananen avatar Jan 13 '23 17:01 Kekananen

Hi @oushujun,

sorry, I got the same issue with Xu, if you fixed it please let me know.

thanks!

best Shuo Cao

unavailable-2374 avatar Apr 04 '23 03:04 unavailable-2374

Hi @oushujun

I have used EDTA before, but now it is causing a problem same as with Xu, I am not sure there is a problem with the updates or something with TIR finding. Is there a way to sort this problem out?

best Awais

awaisfarooq724 avatar Apr 04 '23 17:04 awaisfarooq724

Hello everyone!

Sorry I took so long to find the bug, but I've found it! It appears that the input genome was not properly prepared before running EDTA, i.e., the sequence names were longer than 13 characters, which would trigger EDTA to shorten them for you, but it was not doing a perfect job. The new sequence names were represented by all numbers and that would cause RunGRF.py to quit. Sooo, the simplest solution is to shorten the sequence IDs yourselves, and make sure they are not pure numbers, then it should be good. I will also implement something to avoid this error in the next update. Thank you for using EDTA, and sorry again for the long delay.

Best, Shujun

oushujun avatar Apr 17 '23 23:04 oushujun