papermill
papermill copied to clipboard
Question: Documentation for supported data types of parameters? (E.g. What is proper way to use tuples with Papermill?)
MWE:
input_notebook.ipynb has two cells, a parameters cell:
numbers = None
and a code execution cell
small, big = numbers
papermill_notebook.ipynb has one cell with the code:
import papermill as pm
pm.execute_notebook(
'input_notebook.ipynb',
'output_notebook.ipynb',
parameters = {'numbers':(-1,1)}
)
Tuples are a basic, immutable datatype. The parameter is a tuple whose entries are also immutable and even more basic.
However, attempting to run papermill_notebook.ipynb leads to:
PapermillExecutionError:
---------------------------------------------------------------------------
Exception encountered at "In [2]":
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-4b9474dadad0> in <module>
----> 1 small, big = numbers
ValueError: too many values to unpack (expected 2)
This is because papermill converted the length two tuple, (-1,1), to the length 7 string "(-1,1)".
This definitely wasn't the behavior I was expecting, which seems to suggest I don't sufficiently understand the "philosophy" of Papermill. Question/Request: Would it be possible to point to a relevant portion of the documentation explaining to users which data types they should and should not expect to be able to pass as parameters?
I haven't actually looked at the papermill source code in detail yet, but I'm guessing it has something to do with JSON serialization, and the fact that there is no notion of 'tuple' in JSON?
(E.g. when I tried to pass a lambda as a parameter once I got a '... not JSON serializable error', so maybe papermill is converting to JSON to avoid security risks associated with pickle or whatever.)
For example the above example works using parameters = {'numbers':[-1,1]} (i.e. a list) instead, presumably because lists correspond to the JSON array objects so are serializable without first converting to string?
Thanks for submitting an issue @krinsman. I think your assessment is likely correct. Labeling issue as needs: testing to confirm.
Apologies for not seeing this issue when it was posted btw.
And yes your evaluation of the issue is correct, because we convert all fields to JSON before passing them in a language agnostic manner to the kernel rather than using python language / version specific transport methods (e.g. pickle). That being said I could see an argument for tuples to be automatically converted to JSON lists for this purpose. The conversion from immutable to mutable structure is somewhat irrelevant with the process transfer process anyway and a string conversion is certainly unexpected here. @willingc thoughts on that?
@MSeal As long as we comment and doc, I see nothing wrong with doing the conversion.
No worries, sorry for complaining about this back then. I've gotten used to the interface since then, and Papermill has proven extremely useful for my work -- I really appreciate it!
The error did throw me for a surprise and I was concerned that another person in my position, but who didn't have experience with JSON before, might have been much more at a loss for what to do.
No opinions about whether trying to pass a tuple or not should throw an error or be converted to list, but yes I do agree that the conversion to string was unexpected. It also led to downstream errors which were a little tricky to debug.
I'll mark the issue as an open approved pattern to change. I think it's well described here now and it is unintuitive to new users so making the proposed change would improve interaction patterns.