models icon indicating copy to clipboard operation
models copied to clipboard

MM Pytorch API: TabularTransform for input tabular sequence

Open sararb opened this issue 2 years ago • 1 comments

Fixes # (issue)

Goals :soccer:

  • Add support for padding, transforming, and masking sequential inputs data in MM Pytorch backend
  • The implemented transform classes should: - Support multiple targets - Be used for training, evaluation, and inference

Implementation Details :construction:

  • [x] Implement TabularBatchPadding to pad a group of sequential inputs
  • [x] Implement TabularPredictNext for generating targets of causal next item prediction
  • [ ] Implement TabularPredictLast for generating targets of last item prediction
  • [ ] Implement TabularPredictRandom for generating targets of predicting one random item and truncate the sequence so that the random item is at the last position.
  • [x] Implement TabularMaskRandom for masked language modeling training (MLM) strategy
  • [x] Implement TabularMaskLast for masking last item in the sequence, generally used to evaluate models trained with MLM.

Testing Details :mag:

  • Defined tests for padding and the different sequence transformations

sararb avatar Jun 27 '23 18:06 sararb