papermill icon indicating copy to clipboard operation
papermill copied to clipboard

Adding Zeppelin support

Open cmenguy opened this issue 5 years ago • 2 comments

Hi, we are considering adopting Papermill for parameterizing and running our notebooks, but the main thing stopping us is the lack of support for Zeppelin. Our notebooks are a mix of Jupyter and Zeppelin, and having the ability to run both with the same library would be invaluable.

I was wondering if that is something that has been discussed before, and if this is something that would be a good fit for Papermill?

If this is something that would be of interest, I would be happy to try contributing something there.

cmenguy avatar Sep 14 '18 01:09 cmenguy

So from what I understand, this is slightly tricky because of the way that Zeppelin thinks about a notebook file. Under the hood there is a note.json. I was not able to track down a spec for that file, so we may have no guarantees about what we can expect to find there.

Because it doesn't seem to have a standard, versioned spec that we can adhere to it can be tricky to parameterise. It would likely require creating a library like nbformat for Zeppelin notebooks that would to plug into what we're currently doing with nbformat to parameterise Jupyter notebooks.

Additionally, I'm not sure how the system thinks about metadata…so while it might be possible to apply tags to cells, we may need to figure out a different convention for labeling cells as holding parameters.

mpacer avatar Sep 14 '18 18:09 mpacer

Hi and welcome to the :tada: @cmenguy !

I think there's definitely room in papermill for processing zepplin notebooks. As M mentioned, it definitely operates in a different format than Jupyter so it'd require a few components to get some abstraction upgrades.

The first abstraction that needs adjusting is the node formatting. We'd need something to load the note.json into nbformat or an nbformat-like object for processing. Then parameterization would then need to be able to apply to both notebook formats in a similar manner -- or we'd need parameterization be more abstract if nbformat-like memory store is out. This might require upgrading parameterization to a more plug-in play pattern like we do with other components of papermill either way.

Then we'd want to extend https://github.com/nteract/papermill/pull/204 with an --engine=zepplin to wrap a zepplin executor. This will add some java dependency for this particular engine, but that's ok and we can just raise an exception if the JRE isn't available inside the engine.

And finally we'd need to figure out how to handle the iorw patterns for a non-jupter document. This one would require a little more thought, but I don't see any reason we couldn't solve it there too.

MSeal avatar Sep 14 '18 18:09 MSeal