pyjanitor icon indicating copy to clipboard operation
pyjanitor copied to clipboard

[DOC] Notebook cell number rendering on docs don't enumerate correctly

Open loganthomas opened this issue 5 years ago • 5 comments

Brief Description of Fix

In some of the notebook examples, the cells that indicate the order in which cells are run are off by a few steps. In the below screenshot, the first cell has [1] (as expected). The next cells have [64] and [65]. This may be confusing to new users that don't typically work in notebooks. Screen Shot 2019-07-13 at 10 29 41

Currently, the docs display notebooks that may have cells that are run out of order or run not in sequential order.

I would like to propose a change, such that now the docs display notebooks that render with numerically consistent (and sequentially) run steps. (i.e. [1], [2], [3], ..., etc.).

The simple (but tedious) solution, would be to find notebooks with these issues, re-run them, and then push the changes.

A better (and recommended) approach would be to create a script that automatically runs each notebook used as an example from end to end prior to being used in the docs, to ensure the cells will be run in order and sequentially.

Relevant Context

loganthomas avatar Jul 13 '19 15:07 loganthomas

Great idea! I agree with the better approach (script to run each notebook before uploading to docs)! Would you like to be assigned to this? :D

sallyhong avatar Jul 13 '19 16:07 sallyhong

Is this something that should be added to the make file, and should be run every time, or only when someone knowingly updates a notebook?

Maybe there is something that can be added to the PR pipeline that ensures cells are sequentially numbered, and requests that you run the script if they're not.

hectormz avatar Jul 17 '19 17:07 hectormz

I wrote a little script that scans through the notebooks and raises exceptions if any of the code cells haven't been executed (likely empty cell at end of notebook) or if the cells have not been executed sequentially:

import json
from pathlib import Path

import numpy as np


def check_cell_num(file_name):
    with open(file_name, "r", encoding="utf8") as f:
        notebook = json.load(f)
    execution_counts = np.array(
        [
            cell["execution_count"]
            for cell in notebook["cells"]
            if cell["cell_type"] == "code"
        ],
    )
    if None in execution_counts:
        raise Exception(f"{file_name} contains unexecuted code cells.")
    if not np.all(np.diff(execution_counts) == 1):
        raise Exception(f"Out of order cells in {file_name}")

notebook_dir = Path("examples/notebooks/")

for file in notebook_dir.glob("*.ipynb"):
    check_cell_num(file)

Perhaps this could be included in MAKE etc, and if it fails, the notebook is run with papermill or nbconvert etc. Or simply fails and directs the user to run papermill

hectormz avatar Nov 03 '19 00:11 hectormz

@ericmjl I’d like to revisit this issue. I’m thinking I can take the code @hectormz provided and add a few things that I had in mind.

Just wondering, will this be like scripts/check-autodoc.py? Essentially, we’ll call it during test pipeline? Wondering if there was a way to get it into a pre-commit hook so the error is thrown prior to a pipeline build. Curios to hear what you think here.

I’m happy to take this one, just didn’t know where this file/function should “live”.

loganthomas avatar Sep 27 '20 15:09 loganthomas

I forgot about this! @loganthomas let me know if you have any questions about what I offered up, and I can jog my memory too.

Haven't thought about it in awhile, do not sure where it should live etc...

hectormz avatar Sep 27 '20 15:09 hectormz