seq2seq icon indicating copy to clipboard operation
seq2seq copied to clipboard

torchtext Multi30k

Open NiceMartin opened this issue 5 years ago • 2 comments

when using the following method to create data train, val, test = Multi30k.splits(exts=('.de', '.en'), fields=(DE, EN)) I got the following error message


//anaconda/lib/python3.5/site-packages/torchtext/datasets/translation.py in init(self, path, exts, fields, **kwargs) 31 32 examples = [] ---> 33 with open(src_path) as src_file, open(trg_path) as trg_file: 34 for src_line, trg_line in zip(src_file, trg_file): 35 src_line, trg_line = src_line.strip(), trg_line.strip()

FileNotFoundError: [Errno 2] No such file or directory: '.data/val.de'

Do you have any idea on it? Thank you in advance

NiceMartin avatar Sep 30 '18 03:09 NiceMartin

Multi30k.splits has been updated, but your version is old. Replace it : `def splits(cls, exts, fields, root='.data', train='train', validation='val', test='test2016', **kwargs): """Create dataset objects for splits of the Multi30k dataset.

    Arguments:

        root: Root dataset storage directory. Default is '.data'.
        exts: A tuple containing the extension to path for each language.
        fields: A tuple containing the fields that will be used for data
            in each language.
        train: The prefix of the train data. Default: 'train'.
        validation: The prefix of the validation data. Default: 'val'.
        test: The prefix of the test data. Default: 'test'.
        Remaining keyword arguments: Passed to the splits method of
            Dataset.
    """
    if 'path' not in kwargs:
        expected_folder = os.path.join(root, cls.name)
        path = expected_folder if os.path.exists(expected_folder) else None
    else:
        path = kwargs['path']
        del kwargs['path']

    return super(Multi30k, cls).splits(
        exts, fields, path, root, train, validation, test, **kwargs)

`

1024er avatar Oct 11 '18 02:10 1024er

Yep, I should update this repo.

keon avatar Oct 12 '18 00:10 keon