seq2seq
seq2seq copied to clipboard
torchtext Multi30k
when using the following method to create data train, val, test = Multi30k.splits(exts=('.de', '.en'), fields=(DE, EN)) I got the following error message
//anaconda/lib/python3.5/site-packages/torchtext/datasets/translation.py in init(self, path, exts, fields, **kwargs) 31 32 examples = [] ---> 33 with open(src_path) as src_file, open(trg_path) as trg_file: 34 for src_line, trg_line in zip(src_file, trg_file): 35 src_line, trg_line = src_line.strip(), trg_line.strip()
FileNotFoundError: [Errno 2] No such file or directory: '.data/val.de'
Do you have any idea on it? Thank you in advance
Multi30k.splits has been updated, but your version is old. Replace it : `def splits(cls, exts, fields, root='.data', train='train', validation='val', test='test2016', **kwargs): """Create dataset objects for splits of the Multi30k dataset.
Arguments:
root: Root dataset storage directory. Default is '.data'.
exts: A tuple containing the extension to path for each language.
fields: A tuple containing the fields that will be used for data
in each language.
train: The prefix of the train data. Default: 'train'.
validation: The prefix of the validation data. Default: 'val'.
test: The prefix of the test data. Default: 'test'.
Remaining keyword arguments: Passed to the splits method of
Dataset.
"""
if 'path' not in kwargs:
expected_folder = os.path.join(root, cls.name)
path = expected_folder if os.path.exists(expected_folder) else None
else:
path = kwargs['path']
del kwargs['path']
return super(Multi30k, cls).splits(
exts, fields, path, root, train, validation, test, **kwargs)
`
Yep, I should update this repo.