papermill icon indicating copy to clipboard operation
papermill copied to clipboard

Question: Documentation for supported data types of parameters? (E.g. What is proper way to use tuples with Papermill?)

Open krinsman opened this issue 4 years ago • 5 comments

MWE:

input_notebook.ipynb has two cells, a parameters cell:

 numbers = None

and a code execution cell

 small, big = numbers

papermill_notebook.ipynb has one cell with the code:

import papermill as pm

pm.execute_notebook(
   'input_notebook.ipynb',
   'output_notebook.ipynb',
   parameters = {'numbers':(-1,1)}
)

Tuples are a basic, immutable datatype. The parameter is a tuple whose entries are also immutable and even more basic.

However, attempting to run papermill_notebook.ipynb leads to:

PapermillExecutionError: 
---------------------------------------------------------------------------
Exception encountered at "In [2]":
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-4b9474dadad0> in <module>
----> 1 small, big = numbers

ValueError: too many values to unpack (expected 2)

This is because papermill converted the length two tuple, (-1,1), to the length 7 string "(-1,1)".

This definitely wasn't the behavior I was expecting, which seems to suggest I don't sufficiently understand the "philosophy" of Papermill. Question/Request: Would it be possible to point to a relevant portion of the documentation explaining to users which data types they should and should not expect to be able to pass as parameters?

I haven't actually looked at the papermill source code in detail yet, but I'm guessing it has something to do with JSON serialization, and the fact that there is no notion of 'tuple' in JSON?

(E.g. when I tried to pass a lambda as a parameter once I got a '... not JSON serializable error', so maybe papermill is converting to JSON to avoid security risks associated with pickle or whatever.)

For example the above example works using parameters = {'numbers':[-1,1]} (i.e. a list) instead, presumably because lists correspond to the JSON array objects so are serializable without first converting to string?

krinsman avatar Apr 06 '21 14:04 krinsman

Thanks for submitting an issue @krinsman. I think your assessment is likely correct. Labeling issue as needs: testing to confirm.

willingc avatar Jun 06 '21 16:06 willingc

Apologies for not seeing this issue when it was posted btw.

And yes your evaluation of the issue is correct, because we convert all fields to JSON before passing them in a language agnostic manner to the kernel rather than using python language / version specific transport methods (e.g. pickle). That being said I could see an argument for tuples to be automatically converted to JSON lists for this purpose. The conversion from immutable to mutable structure is somewhat irrelevant with the process transfer process anyway and a string conversion is certainly unexpected here. @willingc thoughts on that?

MSeal avatar Jun 06 '21 18:06 MSeal

@MSeal As long as we comment and doc, I see nothing wrong with doing the conversion.

willingc avatar Jun 06 '21 18:06 willingc

No worries, sorry for complaining about this back then. I've gotten used to the interface since then, and Papermill has proven extremely useful for my work -- I really appreciate it!

The error did throw me for a surprise and I was concerned that another person in my position, but who didn't have experience with JSON before, might have been much more at a loss for what to do.

No opinions about whether trying to pass a tuple or not should throw an error or be converted to list, but yes I do agree that the conversion to string was unexpected. It also led to downstream errors which were a little tricky to debug.

krinsman avatar Jun 07 '21 22:06 krinsman

I'll mark the issue as an open approved pattern to change. I think it's well described here now and it is unintuitive to new users so making the proposed change would improve interaction patterns.

MSeal avatar Jun 16 '21 06:06 MSeal