papermill icon indicating copy to clipboard operation
papermill copied to clipboard

New feature: Inspecting notebook with different tags (besides parameters)

Open DustinKLo opened this issue 5 years ago • 8 comments

I'm liking the inspect_notebook function but for our use case we would like to inspect notebook cells that are tagged with something else besides parameters inspect_notebook documentation

we can add another function argument in inspect_notebook (like tag) and default it to parameters https://github.com/nteract/papermill/blob/main/papermill/inspection.py#L102-L123

def inspect_notebook(notebook_path, tag="parameters", parameters=None):
...

also add a tag argument (default parameters) and populate _infer_parameters with tag

parameter_cell_idx = find_first_tagged_cell_index(nb, "parameters")

https://github.com/nteract/papermill/blob/main/papermill/inspection.py#L37 and find_first_tagged_cell_index will return the first cell it finds with the specified tag value

What do you think? I can go ahead and make a PR if this would be used in the future.

DustinKLo avatar Nov 03 '20 21:11 DustinKLo

My biggest reservation about this is the fact that a lot of UIs (thinking about nteract and Jupyter Notebook) special case their client-side experience for working with parameterized cells by checking for the "parameters" tag. Allowing arbitrary labels here will break that convention for front-end clients.

I'll let other folks chime in on other motivations for this.

captainsafia avatar Nov 05 '20 00:11 captainsafia

@captainsafia thanks for the feedback 👍

inspection of the notebook will always default to inspecting cells tagged with parameters but we just want to be given the option to inspect other cells as well the functionality is already there and would require a small tweak:

parameter_cell_idx = find_first_tagged_cell_index(nb, "parameters")

it'll ensure backwards compatibility because it will the value should default to parameters and ideally won't affect other parts of the library that use the find_first_tagged_cell_index function

let me know what you and your team think, thanks 👍

DustinKLo avatar Nov 07 '20 03:11 DustinKLo

@willingc @rgbkrk Thoughts on this?

captainsafia avatar Nov 09 '20 18:11 captainsafia

@DustinKLo walk me through what the next feature is you would need if we inspected cells that are tagged something other than parameters. What would you want to do with that cell or notebook?

rgbkrk avatar Nov 09 '20 20:11 rgbkrk

Historically, we've tried to keep Papermill's scope conservative and focused on the execution of parameterized notebooks. Like @rgbkrk, I would be interested in @DustinKLo's use cases.

I'm a bit hesitant to open up inspect to things beyond parameters. I could see how inspecting cells for "tags" may be useful. It's not clear from the messages above if the expectation would be execution based on tags too. Use cases would make it a bit easier to determine whether to add the functionality in papermill or somewhere else. :sunny:

willingc avatar Nov 09 '20 21:11 willingc

thanks for your input 👍

We simply just want to inspect cells that are tagged with something else besides parameters We use the inspect_notebook function to extract extra variables to build the proper configuration files in our science data system It would be nice to separate cells to make it a little cleaner/organized

no other use cases besides inspecting cells 👍

DustinKLo avatar Nov 09 '20 22:11 DustinKLo

The use case itself sounds like you want to inspect cells to create configuration files -- is that the best way to put it? These are used by a separate system? Does the notebook itself still need to run or are you just using papermill to do inspection?

rgbkrk avatar Nov 09 '20 23:11 rgbkrk

@rgbkrk yup you got it correct 👍 we're using papermill for inspection to build the jobs, the actual execution of the notebooks are done separately (in a docker container)

DustinKLo avatar Nov 10 '20 02:11 DustinKLo