Semi-supervised-learning
Semi-supervised-learning copied to clipboard
Example for Custom Dataset Usage in NLP
🚀 Feature
How can I use custom nlp dataset to try these algorithms on? I only saw example for CV custom dataset. Main part I am intereseted in is train_transform step for NLP custom dataset.
Hi, we will add demonstration for custom nlp data. But currently only CV dataset is supported.
Currently the easiest way is to use your own Custom Dataset for NLP data and try to match the output of getitem function in your dataset as a dict:
{idx: idx, 'text': some raw text, 'text_s': some raw text}
Note that text_s is obtained by using WMT-19 translation models in fairseq by first translating it to other languages and then back-translating it.
Hi, Thanks for replying.
Ok, I can get the data in this format, how do i run an algorithm on this format data?
You can reference the dataset we used for nlp (https://github.com/microsoft/Semi-supervised-learning/blob/main/semilearn/datasets/nlp_datasets/datasetbase.py) for your dataset.
To run the algorithms on custom dataset, you can refer this notebook (https://github.com/microsoft/Semi-supervised-learning/blob/main/notebooks/Custom_Dataset.ipynb). You only need to change the create data part, and set the net argument in config as a nlp model we supported. I think others would stay the same.
Let me know if you have further questions.