RecSysDatasets
RecSysDatasets copied to clipboard
Codec error when converting movie lens dataset
I followed the instructions on Readme.md to download and convert the movie lens dataset but I got the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte
Just changed the pd.read_csv method on file convertion_tools/src/extended_dataset.py (line 52) to include an encoding argument and fix the problem.
pd.read_csv(self.item_file, delimiter=self.item_sep, header=None, engine='python', encoding = "ISO-8859-1")
Hi, @guedes-joaofelipe! Thank you for your issue, but we can't reproduce the problem here. So could you please check your dataset and your environment again?
I had the same problem.
@EliverQ I had the same problem,When I convert the yelp data set on windows。
Traceback (most recent call last):
File "run.py", line 40, in