text icon indicating copy to clipboard operation
text copied to clipboard

what is currently the ideal effective torchtext pipeline for almost any nlp tasks

Open StephennFernandes opened this issue 3 years ago • 0 comments

searching the ideal torchtext pipeline

Description hey there, so ive been using the legacy version of torchtext for quite sometime as it provides easier ways to load custom dataset and custom pretrained word embeddings locally and i can semlessly implement it for seq2seq, text classification, pos tagging, language modeling etc. most importantly i could use Buckeriterator to sort samples based on their length and group batches based on similar length thus minimize padding. Ive read that the torchdata has these functionalities implemented but couldnt find any tangible resources.

I have 3 requirements:

  1. loading any custom dataset locally.
  2. loading any custom pre-trained embedding locally (fasttext, GLoVe)
  3. being able to implement sort and batch by length to get minimum padding

StephennFernandes avatar Apr 07 '22 13:04 StephennFernandes