naacl_transfer_learning_tutorial icon indicating copy to clipboard operation
naacl_transfer_learning_tutorial copied to clipboard

Bugfix/utils get and tokenize dataset

Open sahilsharma05 opened this issue 4 years ago • 0 comments

Found some issue in get_and_tokenize_dataset method under utils.py. Code changes :

  • Invalid key "labels" when calling dataset_map['labels'] -> replaced by DATASETS_LABELS_URL[dataset_dir]
  • Invalid test.txt and text.labels.txt file for IMDB dataset (doesn't exist on s3). Replaced by valid.txt and valid.labels.txt respectively.
  • replaced 'test' with 'valid' key in global variables DATASETS_URL and DATASETS_LABELS_URL. The training logic is using dataset using valid key.
  • added test case for checking get_and_tokenize_dataset fixes for imdb dataset.

sahilsharma05 avatar Sep 06 '19 08:09 sahilsharma05