RecSysDatasets icon indicating copy to clipboard operation
RecSysDatasets copied to clipboard

I'm having a decoding problem while converting data.

Open ZZZZZZZZeng opened this issue 2 years ago • 1 comments

When I convert the yelp data set on windows,I'm having a decoding problem while converting data。

Traceback (most recent call last): File "run.py", line 40, in datasets.convert_inter() File "D:\学业\研究生\数据集\数据集转换程序\RecSysDatasets-master\conversion_tools\src\extended_dataset.py", line 4581, in convert_inter for _ in fin: UnicodeDecodeError: 'gbk' codec can't decode byte 0x8b in position 1909: illegal multibyte sequence

ZZZZZZZZeng avatar Nov 28 '22 05:11 ZZZZZZZZeng

@ZZZZZZZZeng Hello, thanks for your attention to our repository.

UnicodeDecodeError is occurred because the data format and platform do not match. The default encoding of Python depends on the platform. If it is in a Windows platform, the default encoding is gbk. While the file is encoded by utf-8, this error will be reported. The solution is to add encoding='utf-8' where you report this error.

Please comment if you have further questions.

Sherry-XLL avatar Feb 07 '23 12:02 Sherry-XLL