text
text copied to clipboard
what is currently the ideal effective torchtext pipeline for almost any nlp tasks
searching the ideal torchtext pipeline
Description hey there, so ive been using the legacy version of torchtext for quite sometime as it provides easier ways to load custom dataset and custom pretrained word embeddings locally and i can semlessly implement it for seq2seq, text classification, pos tagging, language modeling etc. most importantly i could use Buckeriterator to sort samples based on their length and group batches based on similar length thus minimize padding. Ive read that the torchdata has these functionalities implemented but couldnt find any tangible resources.
I have 3 requirements:
- loading any custom dataset locally.
- loading any custom pre-trained embedding locally (fasttext, GLoVe)
- being able to implement sort and batch by length to get minimum padding