Merlin icon indicating copy to clipboard operation
Merlin copied to clipboard

Add Merlin Models example notebooks as integration tests with real data

Open karlhigley opened this issue 2 years ago • 8 comments

  • [ ] Add integration tests to merlin models notebooks using Testbook
  • [ ] Define a process where outputs from the notebooks are consistent and can be captured in ASV
  • [ ] Add notebook tests to CI and ensure that CI is working correctly

karlhigley avatar Apr 14 '22 21:04 karlhigley

On real data, not synthetic

EvenOldridge avatar Apr 26 '22 18:04 EvenOldridge

We have three notebooks, which uses synthetic as an example: 03-Exploring-different-models.ipynb 04-Exporting-ranking-models.ipynb 05-Retrieval-Model.ipynb

Do we want to update them to use real data in CI?

bschifferer avatar May 03 '22 10:05 bschifferer

@karlhigley @benfred and @jperez999 as a follow up question to Benedikt's, how CI will work with ALI -CCP real data? we need to sign up to download that dataset. How the dataset will be downloaded?

rnyak avatar May 03 '22 12:05 rnyak

I have no idea (nor do I particularly care whether we run integration tests with real or synthetic data.) I've just written down what people kept telling me was supposed to happen, but if we need to revise the scope or the issues (or both), we should totally do that. Basically, I'm trying to write down what I keep hearing in order to make sure conversations like this one happen, so that we can clarify what's actually workable and do that instead. 😃

karlhigley avatar May 03 '22 16:05 karlhigley

@jperez999 (+ not sure who else can help).

For movielens-1m notebooks, the tests expect following data structure on the CI machine:

/raid/data/movielens/
├── ml-1m
│   ├── README
│   ├── movies.dat
│   ├── ratings.dat
│   ├── users.dat
├── ml-1m.zip

For aliccp-raw, the tests expect following data structure on the CI machine

/raid/data/aliccp/raw/
├── train
│   ├── train_0.parquet
│   ├── train_1.parquet
│   ├── train_2.parquet
│   ├── train_3.parquet
│   ├── train_4.parquet
│   ├── train_5.parquet
│   ├── train_6.parquet
│   └── train_7.parquet
└── valid
    ├── test_0.parquet
    ├── test_1.parquet
    ├── test_2.parquet
    ├── test_3.parquet
    ├── test_4.parquet
    ├── test_5.parquet
    ├── test_6.parquet
    └── test_7.parquet

bschifferer avatar May 05 '22 11:05 bschifferer

@jperez999 have you had a chance to add the dataset to the CI environment? Is there something we can help with?

bschifferer avatar May 12 '22 09:05 bschifferer

What do you all think of breaking the full-dataset runs out from the notebook examples? They could be jobs scheduled to run daily on the latest docker images and latest main branches of each library. Some reasons it could be useful are:

  • We can run the Merlin notebook integration tests (on sampled data) faster, and therefore can run them in PRs for each of the other libraries to make sure a change to merlin-models or merlin-systems doesn't break the demo notebooks.
  • By running the full-dataset jobs daily, we'll still get quick-ish feedback (via pagerduty and/or slack) if code changes introduce an issue at scale.
  • By separating integration test jobs from the example notebooks, we can write more specific jobs to test edge cases and more advanced functionality that doesn't make sense in the examples.
  • Once separated from CI, there's no reason to try to get the test jobs to finish in a reasonable amount of time (except for cost, of course), so we can really scale up the amount of data used to test future features like https://github.com/NVIDIA-Merlin/Merlin/issues/107
  • Regularly scheduled jobs is a more realistic implementation of Merlin-based workflow than notebooks.

nv-alaiacano avatar Aug 17 '22 14:08 nv-alaiacano

Yeah, I think that makes sense as the next incremental step from where we are now

karlhigley avatar Aug 17 '22 14:08 karlhigley