RecSysDatasets
RecSysDatasets copied to clipboard
I'm having a decoding problem while converting data.
When I convert the yelp data set on windows,I'm having a decoding problem while converting data。
Traceback (most recent call last): File "run.py", line 40, in datasets.convert_inter() File "D:\学业\研究生\数据集\数据集转换程序\RecSysDatasets-master\conversion_tools\src\extended_dataset.py", line 4581, in convert_inter for _ in fin: UnicodeDecodeError: 'gbk' codec can't decode byte 0x8b in position 1909: illegal multibyte sequence
@ZZZZZZZZeng Hello, thanks for your attention to our repository.
UnicodeDecodeError
is occurred because the data format and platform do not match. The default encoding of Python depends on the platform. If it is in a Windows platform, the default encoding is gbk
. While the file is encoded by utf-8
, this error will be reported. The solution is to add encoding='utf-8'
where you report this error.
Please comment if you have further questions.