Merlin icon indicating copy to clipboard operation
Merlin copied to clipboard

[RMP] Tensorflow support for session based recommendations integration in Merlin

Open viswa-nvidia opened this issue 2 years ago • 6 comments

Problem:

Session-based and sequential-based models are an active research area for providing personalized recommendations. Transformers4Rec library was built to support the definition of such architectures and the results of our experiments conducted in the T4Rec paper showed the effectiveness of Transformers in modeling short sequences observed in session-based tasks. The T4Rec library was also used to win various RecSys challenges. We also observe a growing interest and active engagement from customers in using Transformers4Rec.

T4Rec was not actively updated for several months as the team shifted its focus to developing the Merlin Model library. MM does not currently support sequential and session-based recsys architectures. MM should support these sequential architectures and provide all necessary support to our users so that they can build such effective models.

Goal:

  • Port the Transformers4Rec TF API to MM.
  • The main blocks are: Masking, Transformer, RNN, and NextItemPredictionTask
  • Provide an example that demonstrates < @jsohn-nvidia to clarify>

Constraints:

  • T4Rec TF API is not as stable and complete as torch API. The main missing points are: - Two out of the 4 masking classes are missing: PLM and RTD - Support for training techniques embedding in HuggingFace trainer class: multi-gpu, early stopping, checkpoints saving... - Conduct experiments with real-world datasets (like the one conducted in the T4Rec paper with the Pytorch API)

Starting Point:

Training

Proposed API:

inputs = InputBlock(
  schema=con_schema + seq_schema,
  post=BroadcastToSequence(con_schema, seq_schema)
)
model = RetrievalModel(
  XLNetEncoder(inputs, n_head=4, n_layer=2),
  CategoricalOutput(inputs.select_by_tag(Tags.ITEM_ID))
)

topk = TopKEncoder(model)
topk.evaluate(...)

loader = mm.Loader(
  dataset, 
  batch_size=1000,
  transforms=PredictMasked(seq_schema, target=Tags.ITEM_ID)
)

model.fit(loader)

Inputs

Add support for sequential-inputs & shared embeddings @edknv, @oliverholworthy & @marcromeyn

  • [x] NVIDIA-Merlin/models#674
  • [x] NVIDIA-Merlin/models#694
  • [x] NVIDIA-Merlin/models#697
  • [ ] NVIDIA-Merlin/models#696

Masking

Add training strategies for sequence models @gabrielspmoreira

  • [x] NVIDIA-Merlin/models#713
  • [ ] NVIDIA-Merlin/models#691
  • [ ] NVIDIA-Merlin/models#692

RetrievalModel

Make RetrievalModel more generic to session-based use-cases + allow encoders being served @marcromeyn

  • [x] NVIDIA-Merlin/models#698
  • [ ] NVIDIA-Merlin/models#729
  • [ ] NVIDIA-Merlin/models#730

Outputs

Improve model-outputs to handle session-based recsys @marcromeyn & @sararb

  • [ ] NVIDIA-Merlin/models#715
  • [ ] NVIDIA-Merlin/models#718
  • [ ] NVIDIA-Merlin/models#731

Port Sequence Architectures

Add sequence-encoding blocks like transformers @sararb

  • [ ] NVIDIA-Merlin/models#719
  • [ ] NVIDIA-Merlin/models#732
  • [ ] NVIDIA-Merlin/models#733
  • [ ] NVIDIA-Merlin/models#721

C. Support of advanced sequential tasks and the definition of examples (22.10)

The scope is defined in this ticket: NVIDIA-Merlin/Merlin#472 . The main objective is to support advanced session-based tasks and create examples of common session-based and sequential-based architectures.

  • [ ] https://github.com/NVIDIA-Merlin/Merlin/issues/472

Inference support

Save schema on model save
  • Session-base model can be used as a candidate generation model, for that we need the following options:

  • [ ] NVIDIA-Merlin/models#736 Note: Make sure we can export the encoder block (Transformer or RNN) together with a SequenceSummary post layer as the query tower.

  • [ ] NVIDIA-Merlin/models#735

  • Constraint: Provide information about input list features to Merlin System NVIDIA-Merlin/Merlin#489

Systems

  • [ ] NVIDIA-Merlin/systems#202

Examples

  • [ ] NVIDIA-Merlin/models#734

viswa-nvidia avatar Jul 05 '22 15:07 viswa-nvidia

@sararb @gabrielspmoreira can you flesh this out as best as possible in @marcromeyn's absence.

EvenOldridge avatar Aug 03 '22 16:08 EvenOldridge

@gabrielspmoreira what does the architecture look like for the system for session based? Are we planning to use session generation to feed into a candidate generation stage?

EvenOldridge avatar Aug 03 '22 16:08 EvenOldridge

@

@gabrielspmoreira what does the architecture look like for the system for session based? Are we planning to use session generation to feed into a candidate generation stage?

@EvenOldridge the session-based recommendation works as a next-item prediction task. It can be seen as a retrieval model, where the query tower users a sequential model (e.g. RNN, Transformer) and outputs a query representation/vector. During inference, such vectors can be used to retrieve the similar items from ANN the same way a retrieval model does. So we believe we wouldn't need anything special on Merlin Systems related to the output of a session-based recommendation model. The main different is in the input, as such sequential models expect list features as input. NVTabular already supports processing and storing such list features as you know, but there might be some challenges on Systems building the Triton ensemble with list features support.

gabrielspmoreira avatar Aug 03 '22 21:08 gabrielspmoreira

@karlhigley , is the systems section here updated. Pleaes review

viswa-nvidia avatar Aug 25 '22 16:08 viswa-nvidia

It is up to date with the current state of our knowledge of the work

karlhigley avatar Aug 25 '22 16:08 karlhigley

@marcromeyn , in one of the meetings, I made a note that this task is dependent on some tasks covered in RMP479-EMBEDDINGS initiative [RMP] Enable users to pass embedding tables directly into the input block in order to more easily support new functionality (non-trainable embeddings, different dimensions, model parallel, etc) . Is this correct ? which are these tasks. ? @EvenOldridge for vis.

viswa-nvidia avatar Aug 29 '22 19:08 viswa-nvidia

@rnyak , please link the systems - multi hot related development

viswa-nvidia avatar Sep 26 '22 17:09 viswa-nvidia

@oliverholworthy , please add the input output schema related tickets to this ticket

viswa-nvidia avatar Nov 15 '22 17:11 viswa-nvidia

@viswa-nvidia For the saving method:

This is the parent issue for that:

  • https://github.com/NVIDIA-Merlin/models/issues/669

We have implemented the save method to save input schema, but currently missing output schema.

oliverholworthy avatar Nov 15 '22 17:11 oliverholworthy

Also identified an error in Merlin Models that may impact ability to serve Transformer-based models that affects issues with saving a model after loading.

https://github.com/NVIDIA-Merlin/models/issues/878

oliverholworthy avatar Nov 15 '22 17:11 oliverholworthy

@karlhigley , please add the Triton related PR ( serving signatures haven't matched up with what model expects ) in the ticket.

viswa-nvidia avatar Jan 10 '23 17:01 viswa-nvidia

@rnyak to follow up with @radekosmulski for blocker ( 23.04 )

viswa-nvidia avatar Apr 11 '23 16:04 viswa-nvidia

It was pointed out to me that we should be consistent about when we consider something done or not so I'm going to reopen this and move it to 22.05. @bbozkaya You've had the only remaining ticket (review the API) on your todo for the past two weeks with no progress. Is this something you're able to take on so that we can close the ticket. If not let us know and we can reassign.

EvenOldridge avatar Apr 26 '23 02:04 EvenOldridge