text
text copied to clipboard
How do I load data from a csv file
I have a dataset containing text and labels seperated by tabs. How I can load this dataset using torchtext?
It's probably similar to the text classification datasets here.
@mttk Do you know if the current library support a csv file loading?
from torchtext import data
TEXT = data.Field()
LABEL = data.LabelField()
fields = [('text', TEXT), ('label', LABEL)]
train_data, test_data = data.TabularDataset.splits(
path = 'data',
train = 'train.csv',
test = 'test.csv',
format = 'tsv', #'tsv' for tabs, 'csv' for commas
fields = fields
)
@zhangguanheng66 Could we add the example @bentrevett posted in an example/usage section, torchtext doesn't really have any examples for external data sets. Adding a few examples for datasets that are not built into torchtext will help new users in understanding how to use torchtext better.
We plan to eventually retire Field
class as legacy code. However, at this moment, we could land a OSS PR as the example to help the usage case above. @M-e-r-c-u-r-y
How can I load AG_news or DBpedia datasets from local csv file using 'text_classification.DATASETS' instead of from google drive?
If you have the paths of train/test files for AG_NEWS and DBpedia, you could save them as a list and start from here. So why not just call the AG_NEWS and DBpedia API to load the datasets?
If you have the paths of train/test files for AG_NEWS and DBpedia, you could save them as a list and start from here. So why not just call the AG_NEWS and DBpedia API to load the datasets?
Thx, because every time I have to use vpn to run the project for getting data from google drive, it's a little trouble. I want to download the CSV file and store them in local directory for convenience. I will try your method, best wishes.
Is there a way to load datasets from CSV files in torchtext == 0.12? It seems like they removed legacy as well.
@y12uc231 In torchtext 0.12 we have migrated our datasets on top of torchdata. You can look at datasets implementation that offer plenty of examples how to work with CSV files or refer the torchdata documentation for additional information on usage and available functionality in datapipes.
Datapipe for reading data from CSV files is here
from torchdata.datapipes.iter import IterableWrapper, FileOpener
dp = IterableWrapper(["my_csv_file.csv"])
dp = FileOpener(dp, mode='b')
dp = dp.parse_csv()
for sample in dp:
print(sample)
Thanks! This works!
Maybe a bit of a different question but do you know how to load Glove embedding vocabulary for my dataset? Vocab class used to have "load_vectors" which don't seem to exist in the latest versions of torchtext.