OrthoFinder icon indicating copy to clipboard operation
OrthoFinder copied to clipboard

UnicodeDecodeError

Open khoojj opened this issue 2 years ago • 6 comments

Hi! After running orthofinder with 2 .faa files as a test run, I received the following errors (please see below).

I'm not sure what's wrong. I did check out a previous thread on a similar error message but I don't think my input files are zipped.. any help appreciated!

Thanks!

OrthoFinder version 2.5.4 Copyright (C) 2014 David Emms

2022-05-30 18:04:38 : Starting OrthoFinder 2.5.4 64 thread(s) for highly parallel tasks (BLAST searches etc.) 8 thread(s) for OrthoFinder algorithm

Checking required programs are installed

Test can run "mcl -h" - ok Test can run "fastme -i /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/SimpleTest.phy -o /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/SimpleTest.tre" - ok

Dividing up work for BLAST for parallel processing

2022-05-30 18:04:38 : Creating diamond database 1 of 2 2022-05-30 18:04:38 : Creating diamond database 2 of 2

Running diamond all-versus-all

Using 64 thread(s) 2022-05-30 18:04:38 : This may take some time.... 2022-05-30 18:04:53 : Done all-versus-all sequence search

Running OrthoFinder algorithm

2022-05-30 18:04:53 : Initial processing of each species ERROR: Blast0_0.txt is corrupted ERROR: Error processing files Blast0_* Malformatted line in /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/Blast0_0.txt Offending line was:

Process Process-66: Traceback (most recent call last): File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits WaterfallMethod.ProcessBlastHits(args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores for row in blastreader: File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self.buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 3: invalid continuation byte ERROR: Blast1_0.txt is corrupted ERROR: Error processing files Blast1 Malformatted line in /pub59/jingk/Rasem/all_faa/test/OrthoFinder/Results_May30_2/WorkingDirectory/Blast1_0.txt Offending line was:

Process Process-67: Traceback (most recent call last): File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap self.run() File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/multiprocessing/process.py", line 99, in run self._target(*self._args, **self._kwargs) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 529, in Worker_ProcessBlastHits WaterfallMethod.ProcessBlastHits(*args, d_pickle=d_pickle, qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/main.py", line 516, in ProcessBlastHits Bij = blast_file_processor.GetBLAST6Scores(seqsInfo, blastDir_list, seqsInfo.speciesToUse[iSpecies], seqsInfo.speciesToUse[jSpecies], qDoubleBlast=qDoubleBlast) File "/pub59/jingk/miniconda3/envs/orthofinder/bin/scripts_of/blast_file_processor.py", line 65, in GetBLAST6Scores for row in blastreader: File "/pub59/jingk/miniconda3/envs/orthofinder/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 2: invalid continuation byte ERROR: An error occurred, please review the error messages they may contain useful information about the problem.

khoojj avatar May 30 '22 17:05 khoojj

The top of the blast output has many weird characters. Please see attached.. Blast1_1.txt.gz

khoojj avatar May 31 '22 11:05 khoojj

Similar error. Mine is "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 1: invalid continuation byte" and "UnicodeDecodeError: 'utf-8' codec can't decode byte 0x86 in position 1: invalid start byte".

GuoanQi1996 avatar Jun 24 '22 01:06 GuoanQi1996

Switch from linux/conda to windows/docker has solved the problem. As the software works fine for most of users without this issue, I think maybe something in our server goes wrong causes this problem.

GuoanQi1996 avatar Jun 24 '22 02:06 GuoanQi1996

Hi, khooji I guess it is a problem caused by diamond, try back your diamond version to 0.9.14.

mundoctor avatar Sep 21 '22 03:09 mundoctor

By the same token, how was it resolved?

Heater233 avatar Sep 03 '23 15:09 Heater233

diamond 0.9.14 can solve this problem try yi try

Bon-jour avatar Dec 21 '23 09:12 Bon-jour