OrthoFinder
OrthoFinder copied to clipboard
UnicodeDecodeError
Hi! After running orthofinder with 2 .faa files as a test run, I received the following errors (please see below).
I'm not sure what's wrong. I did check out a previous thread on a similar error message but I don't think my input files are zipped.. any help appreciated!
Thanks!
OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms
2022-05-30 18:04:38 : Starting OrthoFinder 2.5.4 64 thread(s) for highly parallel tasks (BLAST searches etc.) 8 thread(s) for OrthoFinder algorithm
Checking required programs are installed
Test can run "mcl -h" - ok Test can run "fastme -i /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/SimpleTest.phy -o /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/SimpleTest.tre" - ok
Dividing up work for BLAST for parallel processing
2022-05-30 18:04:38 : Creating diamond database 1 of 2 2022-05-30 18:04:38 : Creating diamond database 2 of 2
Running diamond all-versus-all
Using 64 thread(s) 2022-05-30 18:04:38 : This may take some time.... 2022-05-30 18:04:53 : Done all-versus-all sequence search
Running OrthoFinder algorithm
2022-05-30 18:04:53 : Initial processing of each species ERROR: Blast0_0.txt is corrupted ERROR: Error processing files Blast0_* Malformatted line in /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/Blast0_0.txt Offending line was:
Process Process-66: Traceback (most recent call last): File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits WaterfallMethod.ProcessBlastHits(args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores for row in blastreader: File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self.buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 3: invalid continuation byte ERROR: Blast1_0.txt is corrupted ERROR: Error processing files Blast1 Malformatted line in /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/Blast1_0.txt Offending line was:
Process Process-67: Traceback (most recent call last): File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores for row in blastreader: File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 2: invalid continuation byte ERROR: An error occurred, please review the error messages they may contain useful information about the problem.
The top of the blast output has many weird characters. Please see attached.. Blast1_1.txt.gz
Similar error. Mine is "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 1: invalid continuation byte" and "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 1: invalid start byte".
Switch from linux/conda to windows/docker has solved the problem. As the software works fine for most of users without this issue, I think maybe something in our server goes wrong causes this problem.
Hi, khooji I guess it is a problem caused by diamond, try back your diamond version to 0.9.14.
By the same token, how was it resolved?
diamond 0.9.14 can solve this problem try yi try