cnn-dailymail icon indicating copy to clipboard operation
cnn-dailymail copied to clipboard

Fixing utf-8 encoding bug

Open SampannaKahu opened this issue 6 years ago • 0 comments

Hi, While running the original file on my system, I get the following error:

Making bin file for URLs listed in url_lists/all_test.txt...
Writing story 0 of 11490; 0.00 percent done
Traceback (most recent call last):
  File "make_datafiles.py", line 256, in <module>
    write_to_tar(all_test_urls, os.path.join(finished_files_dir, "test.tar"))
  File "make_datafiles.py", line 185, in write_to_tar
    article_sents, abstract_sents = get_art_abs(story_file)
  File "make_datafiles.py", line 109, in get_art_abs
    lines = read_story_file(story_file)
  File "make_datafiles.py", line 78, in read_story_file
    contents = f.read()
  File "/home/sampanna/.conda/envs/fast_abs_rl/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 858: ordinal not in range(128)

This PR fixes that.

SampannaKahu avatar Nov 08 '18 03:11 SampannaKahu