PyTorchNLPBook icon indicating copy to clipboard operation
PyTorchNLPBook copied to clipboard

Chapter 03 Yelp Dataset has a Typo

Open amancioandre opened this issue 5 years ago • 0 comments

Hi everyone,

Chapter 3 does not load Yelp data due to a typo on the last line of the dataset:

Line Review 73357: "1","Capital City Transfer han

Using nrows argument passing the number of rows - 1, fixed for me.

train_reviews = pd.read_csv(args.raw_train_dataset_csv, header=None, names = ['rating', 'review'], nrows=73356)

Or

train_reviews = pd.read_csv(args.raw_train_dataset_csv, header=None, names = ['rating', 'review'], error_bad_lines=False)

Or by just appending a " at this line.

Still, would be nice to fix this typo on the dataset.

amancioandre avatar Jun 11 '20 12:06 amancioandre