pytorch-seq2seq
pytorch-seq2seq copied to clipboard
torchtext recent version (0.12.0) doesn't support Field, BucketIterator
The recent version of torchtext 0.12.0 doesn't support Field, BuckeIterator, etc. What is the equivalent modules to pre-process the datasets like Multi30k, IWSLT2016, IWSLT2017 etc? Thanks.
I use torchtext with version = 0.11 solves the problem.
conda install pytorch torchtext=0.11 cudatoolkit=11.3 -c pytorch
Torchtext >= 0.12 had removed Field and lagacy modules. You can try THIS :
from torch.utils.data import DataLoader
from torch.nn.utils.rnn import pad_sequence
from collections import Counter
from torchtext.datasets import Multi30k
from torchtext.vocab import vocab
from torchtext.data import get_tokenizer
@Jiazxu What to do in case of custom dataset stored as a csv file? How to load it? And then perform train validation split.
@Jiazxu What to do in case of custom dataset stored as a csv file? How to load it? And then perform train validation split.
It can be done by the Panda Lirary. First, tansforms the .csv file to a torch.utils.data.Dataset class. The code is like (Details depend on your data content):
import pandas as pd
import torch
import copy
from torch.utils.data import DataLoader, Dataset
class xxx:
def xxx:
data = pd.read_csv(data_dir)
data_tensor = torch.tensor(data.values)
label = copy.copy(data_tensor)
return data, label
Then you can put the DataSet_csv into the DataLoader.