models issues

[BUG] Dataloader doesnt release memory and memory growth

3

### Bug description I run a training script and reinitialize nvtabular dataloaders. After each initialization, the available GPU memory decreases (fmem = pynvml_mem_size(kind="free", index=0) ). That is unexpected. The available...

bschifferer

bug

question

P0

[WIP] add an example notebook explaining hyperparameter optimization using Optuna and showing retrieval on h&m data

28

We have had a couple of discussions recently on exploring using H&M data. Also, some time ago we discussed creating an example around using HPO (it can be a nice...

radekosmulski

documentation

examples

[WIP] Add initial draft of example notebook using horovod

2

A draft PR that shows the workflow. Depends on #783. Currently uses a workaround that re-partitions the dataset, i.e., `ddf = train.to_ddf().repartition(npartitions=hvd.size())`. After some preprocessing with nvtabular, the training code...

edknv

[Task] Add RNN-based model apis for session-based recommendation

1

### Description Currently there is no out-of-the-box api for creating an LSTM, BiLSTM and GRU architectures for a session-based (or sequential) task. - For LSTM , I assume we'd use...

rnyak

enhancement

area/session-based

Reproduce selected results from Transformers4Rec paper with Merlin Models API

1

### Description For Transformers4Rec, we have created a training/eval [script](https://github.com/NVIDIA-Merlin/Transformers4Rec/tree/main/examples/t4rec_paper_experiments) for reproducing the [paper experiments](https://dl.acm.org/doi/10.1145/3460231.3474255), that takes a set of hparams as command line arguments and a preprocessed dataset. This...

gabrielspmoreira

Update the retrieval integration tests on CI to use the refactored MF and TwoTower

1

viswa-nvidia

[FEA] Mixed Precision Support for Merlin Models TensorFlow

1

# 🚀 Feature request

bschifferer

enhancement

P2

Reduce running time of unit tests

1

### Goals :soccer: - Improve iteration speed by reducing the running time of unit tests ### Implementation Details :construction: ### Testing Details :mag:

oliverholworthy

[Task] Add LSTM and BiLSTM unit tests for session-based recommendation

### Description Currently we do not have any unit test for using LSTM or BiLSTM for session-based recommendation tasks. it'd be useful to add one. For LSTM we'd use `tf.keras.layers.LSTM`...

rnyak

[BUG] Data parallel training freezes due to different number of batches

7

### Bug description In data parallel training, we start multiple workers with different initialization of the dataloader and train with horovod. After each batch update, the parameters are synced. Merlin...

bschifferer

bug

P0

models
models copied to clipboard

Metadata

[BUG] Dataloader doesnt release memory and memory growth

[WIP] add an example notebook explaining hyperparameter optimization using Optuna and showing retrieval on h&m data

[WIP] Add initial draft of example notebook using horovod

[Task] Add RNN-based model apis for session-based recommendation

Reproduce selected results from Transformers4Rec paper with Merlin Models API

Update the retrieval integration tests on CI to use the refactored MF and TwoTower

[FEA] Mixed Precision Support for Merlin Models TensorFlow

Reduce running time of unit tests

[Task] Add LSTM and BiLSTM unit tests for session-based recommendation

[BUG] Data parallel training freezes due to different number of batches

← Metadata

Owner

Metadata

models models copied to clipboard

Metadata

← Metadata

Owner

Metadata

models
models copied to clipboard