Problem

How can I quickly go from experimentation (.ipynb) to production (typically .py)?

Current Solution

The prevailing method of "productioniz'ing" notebooks is,

Convert notebooks to python scripts
Clean up the script, write some tests, get a code review done
Setup cloud machine, install all the libraries (or dockerize the script)
Run the script manually or setup a crontab

Is this really efficient?

Challenging the Status Quo

What if we run notebooks directly for our production workflows? Here are some benefits,

Rich output for each execution (notebook itself!)
Quickly go from experimentation to production. No time spent in extracting code from .ipynb
Failed workflows are easy to debug (thanks to the rich notebook output)

Why do we really need to convert notebooks to python scripts? Here are a few common objections (I'd love to learn more in comments),

Code Review - We can review notebooks directly with ReviewNB & nbdime (.py is not necessary).
Testing - We can directly write tests for notebook code with Treon and a few other tools (.py is not necessary here either).
Code reuse - This is a legit reason. You should definitely convert notebook code into libraries whenever possible. It makes reuse super easy and keeps the notebook readable. But we don't need to convert entire notebook into a script, do we? The final execution can easily be running a notebook that imports the libraries we created.

Proposed Solution

You select a notebook from GitHub repo and set a schedule for it to run (once/daily/weekly etc.).
You select the instance type (memory, vCPU) for execution.
You can specify different parameters for each run via Papermill
ReviewNB executes this notebook on your specified schedule & preserves the result of each run (as an executed notebook)
ReviewNB supports notebook workflows (parallel executions for different parameters, result of one notebook feeds into the next etc.)
For environment, we use stable versions of commonly used DS libraries. User can specify their own environment as well (via dockerfile)

Motivation

Scheduling Notebooks at Netflix

FAQ

Can we run notebooks on our own hardware? Absolutely. You can self host ReviewNB & hook it up to your own AWS/GCP account to execute notebooks on your own machines.
How will I specify sensitive data (e.g. DB credentials) required for execution? ReviewNB provides a prompt to set any sensitive data as environment variables that are available to notebook at runtime.

Feel free to upvote/downvote the issue indicating whether you think this is useful feature or not. I also welcome additional questions/comments/discussion on the issue.

Dec 26 '19 03:12 amit1rrr

This would be an amazing feature. We use notebooks in more than one aspect in our organization:

Data Science:

Model Creation - probably won't be run on the cloud - they get translated to pure python first
Validation: We need to run these every time we update our models.

Devops

One-off scripts (for DB migration, backfilling or any emergency ops) get written as notebooks into a /playbooks directory, they are reviewed on GH and then run locally right now. It would be very valuable to run this from a preset environment.

For any of these use cases, the permission and security model would dictate if we could use it as a part of our workflow.

Jan 08 '20 21:01 srossross

Thank you @srossross

For any of these use cases, the permission and security model would dictate if we could use it as a part of our workflow.

I'm thinking of relying on GitHub permissions. E.g. All users who have read access on a private GitHub repository can also see all periodic jobs for that repository. All users who can write to that repository can also edit/create jobs for that repository. Would this work or do you need a separate permission system for jobs?

Validation: We need to run these every time we update our models.

How are you running these currently? (manually or automated jobs) Where are you running these currently? (locally or cloud)

Model Creation - probably won't be run on the cloud - they get translated to pure python first

Just curious, why not run these as notebooks as well? Are they not suitable for the notebook format?

Jan 09 '20 08:01 amit1rrr

I think fast.ai nbdevTemplate solves these problems

Feb 01 '21 09:02 TejasAvinashShetty

support support copied to clipboard

Scheduling Notebooks

Problem

Current Solution

Challenging the Status Quo

Proposed Solution

Motivation

FAQ

Data Science:

Devops

support
support copied to clipboard