naacl_transfer_learning_tutorial
naacl_transfer_learning_tutorial copied to clipboard
Bugfix/utils get and tokenize dataset
Found some issue in get_and_tokenize_dataset method under utils.py. Code changes :
- Invalid key "labels" when calling dataset_map['labels'] -> replaced by DATASETS_LABELS_URL[dataset_dir]
- Invalid test.txt and text.labels.txt file for IMDB dataset (doesn't exist on s3). Replaced by valid.txt and valid.labels.txt respectively.
- replaced 'test' with 'valid' key in global variables DATASETS_URL and DATASETS_LABELS_URL. The training logic is using dataset using valid key.
- added test case for checking get_and_tokenize_dataset fixes for imdb dataset.