papermill
papermill copied to clipboard
Nbformat/nbformat_minor not well extracted with HTTP handler
🐛 Bug
I'm currently trying to create a connector between Jupyter (using papermill) and another product named "Cortex" from the Strangee project. I encountered an issue during my development. I'm currently testing the HTTP handler by trying to execute a notebook located on a JupyterHub instance which has a "demo" user for who a "cortex_job" server is configured.
import papermill as pm
pm.execute_notebook(
"http://192.168.1.117:8000/user/demo/cortex_job/api/contents/notebook1.ipynb?token=SECRET",
"http://192.168.1.117:8000/user/demo/cortex_job/api/contents/Folder1/notebook2.ipynb?token=SECRET",
parameters = dict(var1 = "toto")
)
Everything is working fine to recover the notebook but I get an error message:
---------------------------------------------------------------------------
ValidationError Traceback (most recent call last)
Cell In[1], line 3
1 import papermill as pm
----> 3 pm.execute_notebook(
4 "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/notebook1.ipynb?token=SECRET",
5 "http://192.168.1.117:8000/user/demo/cortex_job/api/contents/Folder1/notebook2.ipynb?token=SECRET",
6 parameters = dict(var1 = "toto")
7 )
File /usr/local/lib/python3.10/dist-packages/papermill/execute.py:89, in execute_notebook(input_path, output_path, parameters, engine_name, request_save_on_cell_execute, prepare_only, kernel_name, language, progress_bar, log_output, stdout_file, stderr_file, start_timeout, report_mode, cwd, **engine_kwargs)
86 if cwd is not None:
87 logger.info("Working directory: {}".format(get_pretty_path(cwd)))
---> 89 nb = load_notebook_node(input_path)
91 # Parameterize the Notebook.
92 if parameters:
File /usr/local/lib/python3.10/dist-packages/papermill/iorw.py:512, in load_notebook_node(notebook_path)
502 def load_notebook_node(notebook_path):
503 """Returns a notebook object with papermill metadata loaded from the specified path.
504
505 Args:
(...)
510
511 """
--> 512 nb = nbformat.reads(papermill_io.read(notebook_path), as_version=4)
513 nb_upgraded = nbformat.v4.upgrade(nb)
514 if nb_upgraded is not None:
File /usr/local/lib/python3.10/dist-packages/nbformat/__init__.py:91, in reads(s, as_version, capture_validation_error, **kwargs)
89 nb = reader.reads(s, **kwargs)
90 if as_version is not NO_CONVERT:
---> 91 nb = convert(nb, as_version)
92 try:
93 validate(nb)
File /usr/local/lib/python3.10/dist-packages/nbformat/converter.py:62, in convert(nb, to_version)
60 except AttributeError as e:
61 msg = f"Notebook could not be converted from version {version} to version {step_version} because it's missing a key: {e}"
---> 62 raise ValidationError(msg) from None
64 # Recursively convert until target version is reached.
65 return convert(converted, to_version)
ValidationError: Notebook could not be converted from version 1 to version 2 because it's missing a key: cells
When looking into the code, we can see the HTTP handler way of working, which is getting the all response content:
Which gives:
{
"name":"notebook1.ipynb",
"path":"notebook1.ipynb",
"last_modified":"2023-07-12T11:43:37.265003Z",
"created":"2023-07-12T11:43:37.265003Z",
"content":{
"cells":[
{
"cell_type":"markdown",
"id":"e0882b67",
"metadata":{
},
"source":"# My title\n\n## My subtitle\n\nHello world!"
},
{
"cell_type":"code",
"execution_count":1,
"id":"e92789a6",
"metadata":{
"tags":[
"parameters"
],
"trusted":true
},
"outputs":[
],
"source":"var1 = 3\nvar2 = 5"
},
{
"cell_type":"code",
"execution_count":2,
"id":"d49d5a2b",
"metadata":{
"trusted":true
},
"outputs":[
{
"name":"stdout",
"output_type":"stream",
"text":"var1 is 3, var2 is 5\n"
}
],
"source":"print(\"var1 is {0}, var2 is {1}\".format(var1,var2))"
}
],
"metadata":{
"celltoolbar":"Tags",
"kernelspec":{
"display_name":"Python 3 (ipykernel)",
"language":"python",
"name":"python3"
},
"language_info":{
"codemirror_mode":{
"name":"ipython",
"version":3
},
"file_extension":".py",
"mimetype":"text/x-python",
"name":"python",
"nbconvert_exporter":"python",
"pygments_lexer":"ipython3",
"version":"3.10.6"
}
},
"nbformat":4,
"nbformat_minor":5
},
"format":"json",
"mimetype":"None",
"size":1188,
"writable":true,
"type":"notebook"
}
As you can notice, the nbformat variable is set to 4 but papermill found out that it was 1 (default value).
This assumption is coming from here (under the library nbformat
which is reading the notebook):
As you can see, the version is taken from the root node "nbformat" instead of "content.nbformat" which is causing the issue.
Do you know if this a bug on your side or on the nbformat library maybe ? I tested it with a LocalHandler and it's working fine as the output is:
{
"cells": [
{
"cell_type": "markdown",
"id": "e0882b67",
"metadata": {},
"source": [
"# My title\n",
"\n",
"## My subtitle\n",
"\n",
"Hello world!"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "e92789a6",
"metadata": {
"tags": [
"parameters"
]
},
"outputs": [],
"source": [
"var1 = 3\n",
"var2 = 5"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d49d5a2b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"var1 is 3, var2 is 5\n"
]
}
],
"source": [
"print(\"var1 is {0}, var2 is {1}\".format(var1,var2))"
]
}
],
"metadata": {
"celltoolbar": "Tags",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
A solution could be to load the JSON answer and get the "content" node before returning the result in the HTTP handler
Thank you
Fix working on my side:
papermill/iorw.py
class HttpHandler(object):
@classmethod
def read(cls, path):
return json.dumps(requests.get(path, headers={'Accept': 'application/json'}).json()["content"])
@classmethod
def listdir(cls, path):
raise PapermillException('listdir is not supported by HttpHandler')
@classmethod
def write(cls, buf, path):
payload = {"type": "notebook", "format": "json", "path": path}
payload["content"] = json.loads(buf)
result = requests.put(path, json=payload)
result.raise_for_status()
@classmethod
def pretty_path(cls, path):
return path