transformer icon indicating copy to clipboard operation
transformer copied to clipboard

'gbk' codec can't decode byte 0x93 in position 978: illegal multibyte sequence and then a bytes-like object is required, not 'str'

Open Ailing-Zou opened this issue 5 years ago • 2 comments

Hi, when I first run this code,

File "D:/transformer/prepro.py", line 37, in _prepro = lambda x: [line.strip() for line in open(x, 'r').read().split("\n")
UnicodeDecodeError: 'gbk' codec can't decode byte 0x93 in position 978: illegal multibyte sequence

After I change this row into _prepro = lambda x: [line.strip() for line in open(x, 'rb).read().split("\n")
if not line.startswith("<")] a bytes-like object is required, not 'str'.

So what kind of way should I use to open this file? Look forward to reply.

Ailing-Zou avatar Dec 10 '19 12:12 Ailing-Zou

adding encoding='utf-8' in open function when you open file

lushunn avatar Sep 21 '20 07:09 lushunn

adding encoding='utf-8' in open function when you open file

NB!

KMY-SEU avatar Oct 11 '20 01:10 KMY-SEU