specio icon indicating copy to clipboard operation
specio copied to clipboard

UnicodeDecodeError

Open laziszaire opened this issue 5 years ago • 5 comments

when I read a .sp file, an unicode decode error pops up. I look up on stackoverflow here, It says i should change encoding method. Do you know where can I change the encoding method? here is the file.

ds-035-1-1.zip

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-74-2ccbb1759542> in <module>()
      1 import specio
----> 2 specread(allfiles[0])

C:\ProgramData\Anaconda3\lib\site-packages\specio\core\functions.py in specread(uri, format, tol_wavelength, **kwargs)
    244     filenames = _validate_filenames(uri)
    245     spectrum = [_get_reader_get_data(f, format, **kwargs)
--> 246                 for f in filenames]
    247     return (_zip_spectrum(spectrum, tol_wavelength) if len(spectrum) > 1
    248             else spectrum[0])

C:\ProgramData\Anaconda3\lib\site-packages\specio\core\functions.py in <listcomp>(.0)
    244     filenames = _validate_filenames(uri)
    245     spectrum = [_get_reader_get_data(f, format, **kwargs)
--> 246                 for f in filenames]
    247     return (_zip_spectrum(spectrum, tol_wavelength) if len(spectrum) > 1
    248             else spectrum[0])

C:\ProgramData\Anaconda3\lib\site-packages\specio\core\functions.py in _get_reader_get_data(uri, format, **kwargs)
    115 def _get_reader_get_data(uri, format, **kwargs):
    116     """Get the reader and the associated data."""
--> 117     reader = get_reader(uri, format, **kwargs)
    118     with reader:
    119         return reader.get_data(index=None)

C:\ProgramData\Anaconda3\lib\site-packages\specio\core\functions.py in get_reader(uri, format, **kwargs)
    108 
    109     # Return its reader object
--> 110     return format.get_reader(request)
    111 
    112 # Spectra

C:\ProgramData\Anaconda3\lib\site-packages\specio\core\format.py in get_reader(self, request)
    111 
    112         """
--> 113         return self.Reader(self, request)
    114 
    115     def can_read(self, request):

C:\ProgramData\Anaconda3\lib\site-packages\specio\core\format.py in __init__(self, format, request)
    163             self._request = request
    164             # Open the reader
--> 165             self._open(**self.request.kwargs.copy())
    166 
    167         @property

C:\ProgramData\Anaconda3\lib\site-packages\specio\plugins\sp.py in _open(self)
    387         def _open(self):
    388             self._fp = self.request.get_file()
--> 389             self._data = self._read_sp(self._fp)
    390 
    391         def _close(self):

C:\ProgramData\Anaconda3\lib\site-packages\specio\plugins\sp.py in _read_sp(sp_file)
    357 
    358             meta.update(_decode_5104(
--> 359                 content[start_byte:start_byte + block_size]))
    360 
    361             start_byte = NBP[1]

C:\ProgramData\Anaconda3\lib\site-packages\specio\plugins\sp.py in _decode_5104(data)
     69                 '<h', data[start_byte:start_byte + 2])[0]
     70             start_byte += 2
---> 71             text.append(data[start_byte:start_byte + text_size].decode('utf8'))
     72             start_byte += text_size
     73             start_byte += 6

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd6 in position 25: invalid continuation byte

laziszaire avatar Jul 17 '18 07:07 laziszaire

covert .sp file into .JDX , then use jcamp to read works

laziszaire avatar Jul 23 '18 01:07 laziszaire

We don't allow to pass the encoding actually. Do you know which encoding is you sp file?

glemaitre avatar Jul 24 '18 08:07 glemaitre

Thank you very much for this python library. It is very useful. However sometimes, depending of the .sp file, I also get a unicode decode error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 36: invalid continuation byte

Do you have any idea how to solve this problem ? The encoding of my sp file seems to be ANSI. I have tried to convert it into UTF-8, but it didn't solve the problem.

ccholet avatar May 04 '21 15:05 ccholet

cchlet or anyone Did you find out any solution yet ???

yayatinaresh avatar Nov 09 '21 08:11 yayatinaresh

I posted a solution in the other issue with the same problem see details there but basically you can edit the encoding in the sp.py file in the plugin folder.

Jseph-maker avatar Sep 28 '22 19:09 Jseph-maker