[Task] Port the transformer blocks from Transformers4Rec (PyTorch) to Merlin Models (Tensorflow)
Problem:
The stable API in Transformers4Rec is based on PyTorch and includes all components used to run session-based research experiments reported in T4Rec paper. On the other hand the Merlin Models does not support a stable PyTorch API yet.
Goal:
-
The goal of this work is to port all t4rec blocks needed for defining transformer-based recommendation models (PyTorch implementation) into Merlin Models (Tensorflow implementation).
-
This work is not about improving t4rec current API or adding new blocks other than the existing ones
Constraints:
-
The Pytorch T4Rec API is inheriting from HuggingFace trainer class for supporting optimized techniques such as fp16, multi-gpu, early-stopping… We need to provide clear guidance of how to set up these techniques using the Keras fit method.
-
Current T4Rec implementation is using the schema class from old merlin_standardlib, the migration to use merlin-core should happen before porting to Merlin Models to makes sure all blocks are correctly working with the new Schema class.
Starting Point:
-
[ ] Implement MaskingBlock : Causal LM, Masked LM, Permutation LM, and Replacement Token Detection
-
[ ] Port the Transformer-block class related to HugginFace architectures adapted for next item prediction (Link to HF layers)
-
[ ] Implementation of Transformer block based in the configs defined in HuggingFace - transformers-based library. Note: This is an example of how Transformer-based architectures are implemented in HF (as keras layers).
-
[ ] Support setting the masking task within model.compile()
-
[ ] Create example or/and guidelines of how to train transformer-based models with techniques supported in Pytorch trainer class: Early stopping, fp16, lr scheduler, model checkpoints
@sararb , could you please help to define this ticket.
I moved the task of creating the example to a separate ticket NVIDIA-Merlin/models#791 (expected in 22.11)