nbclient
nbclient copied to clipboard
Using nbclient to talk to jupyter lab running remotely
Hi, I have a use case wherein i have jupyterlab server running on an EC2 instance and i want to run a .ipynb file against a kernel inside that jupyterlab. I was wondering if i can use nbclient to achieve that? I have used nbclient to talk to enterprise gateway and run notebooks but when i try the same approach for standalone jupyterlab server, it doesnt work.
Hi @amit-chandak-unskript, I think you will need to pass a kernel manager to nbclient, with a custom kernel provisioner that allows to talk to the kernel remotely. If you don't want to control the life cycle of this kernel, but only execute code, I think you just need to get the kernel's connection info. The ZMQ sockets are TCP sockets that can be accessed remotely, so this should work. But I don't know if this has been done before, maybe @kevin-bates has ideas?
Thanks @davidbrochart, hello @amit-chandak-unskript.
Hmm. One of the primary differences between running a kernel via the jupyter-server (lab) REST API and the gateway REST API, is that jupyter-server is session-centric, while the gateway's are more kernel-centric, meaning that all things start from a session in jupyter-server whereas the gateway doesn't. What happens when you use the GatewayKernelManager with nbclient and point at your jupyter-server instance?
Might you be able to deploy (and expose) a gateway instance beside your lab instance in EC2? They could both share the same kernel specifications, but have their own managed space (i.e., Lab won't see the Gateway's kernels and vice versa).
Thanks @kevin-bates @davidbrochart ,
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/jupyter_server/gateway/gateway_client.py", line 404, in gateway_request
response = await client.fetch(endpoint, **kwargs)
tornado.httpclient.HTTPClientError: HTTP 403: Forbidden
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "temp.py", line 84, in <module>
main()
File "temp.py", line 79, in main
run_notebook(eg_url, bucket, input_key, output_key)
File "temp.py", line 44, in run_notebook
resp = client.execute(kernel_name="python3")
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/util.py", line 78, in wrapped
return just_run(coro(*args, **kwargs))
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/util.py", line 57, in just_run
return loop.run_until_complete(coro)
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/asyncio/base_events.py", line 587, in run_until_complete
return future.result()
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/client.py", line 542, in async_execute
async with self.async_setup_kernel(**kwargs):
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/contextlib.py", line 170, in __aenter__
return await self.gen.__anext__()
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/client.py", line 500, in async_setup_kernel
await self.async_start_new_kernel(**kwargs)
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/client.py", line 412, in async_start_new_kernel
await ensure_async(self.km.start_kernel(extra_arguments=self.extra_arguments, **kwargs))
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/nbclient/util.py", line 89, in ensure_async
result = await obj
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/jupyter_server/gateway/managers.py", line 438, in start_kernel
response = await gateway_request(self.kernels_url, method="POST", body=json_body)
File "/Users/amit/miniconda3/envs/connectors/lib/python3.7/site-packages/jupyter_server/gateway/gateway_client.py", line 425, in gateway_request
) from e
tornado.web.HTTPError: HTTP 403: Forbidden (Error attempting to connect to Gateway server url
I see the above error, when i am using the jupyterlab server url. Do i need to add some path to the base url?
Here is my code snippet
import os
import logging
import argparse
import nbformat
import boto3
from nbclient import NotebookClient
from nbclient.exceptions import CellExecutionError
from jupyter_server.gateway.managers import GatewayKernelManager
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# The function runs the notebook in S3 against the specified EG.
# It expects the following arguments:
# eg_url - This is the url of the Enterprise gateway.
# bucket - Name of the bucket where the notebook is kept.
# input_key - S3 notebook to run.
# output_key - S3 notebook to store the output of the run.
def run_notebook(eg_url, bucket, input_key, output_key) -> bool:
os.environ["JUPYTER_GATEWAY_URL"] = eg_url
s3 = boto3.client('s3')
try:
getObject = s3.get_object(Bucket=bucket, Key=input_key)
except Exception:
logger.error(f'GetObject {input_key} failed')
return False
fp = getObject['Body']
try:
run_notebook = nbformat.read(fp, as_version=4)
except Exception:
logger.error('Wrong notebook format')
return False
client = NotebookClient(nb=run_notebook, kernel_manager_class=GatewayKernelManager)
try:
resp = client.execute(kernel_name="python3")
except CellExecutionError:
pass
# Reads the output file contents.
bodyStr = nbformat.writes(run_notebook)
# Upload the output nb file to s3.
try:
s3.put_object(Body=str.encode(bodyStr), Bucket=bucket, Key=output_key)
except Exception:
logger.error(f'S3 upload {output_key} failed')
return False
return True
def main():
parser = argparse.ArgumentParser(description='Execute runbook nb file.')
parser.add_argument('bucket', metavar='bucket', type=str,
help='AWS S3 bucket name where nb files are stored')
parser.add_argument('input_key', metavar='input_key', type=str,
help='S3 key to runbook nb file')
parser.add_argument('output_key', metavar='output_key', type=str,
help='S3 key to where output nb file will be stored')
parser.add_argument('eg_url', metavar='gateway_url', type=str,
help='Url to Jupyterlab gateway')
args = parser.parse_args()
bucket = args.bucket
input_key = args.input_key
output_key = args.output_key
eg_url = args.eg_url
run_notebook(eg_url, bucket, input_key, output_key)
return
if __name__ == '__main__':
main()
The above script works fine with enterprise gateway. I am just trying the same script with jupyterlab url passed as eg_url argument.
Its not reachability issue, as i confirmed the jupyterlab server url works if i do the following
curl -XGET https://<jupyterlab base url>/api/kernelspecs
{"default": "python3", "kernelspecs": {"python3": {"name": "python3", "spec": {"argv": ["/opt/conda/bin/python", "-m", "ipykernel_launcher", "-f", "{connection_file}"], "env": {}, "display_name": "python3", "language": "python", "interrupt_mode": "signal", "metadata": {"debugger": true}}, "resources": {"logo-32x32": "/d3c71c1b-d075-4b5c-a6dd-9f417f483c3e/kernelspecs/python3/logo-32x32.png", "logo-64x64": "/d3c71c1b-d075-4b5c-a6dd-9f417f483c3e/kernelspecs/python3/logo-64x64.png"}}}}(con
@kevin-bates i like your idea of having an enterprise gateway beside the jupyterlab server with same kernelspec and use that for nbclient. But i would like to avoid maintaining 2 instances. So, nbclient was never meant to be used with remote jupyterlab server, is it?
So, nbclient was never meant to be used with remote jupyterlab server, is it?
No, it wasn't. Nbclient doesn't talk any HTTP, and jupyter-server is an HTTP server. The only web that it talks is TCP sockets, which are used to connect to a (remote) kernel. If it had access to the kernel's connection info, that would be possible.
Thanks @davidbrochart , one more question, is it possible to make nbclient use a custom kernel to run? as in does it take a kernel spec to run the notebook?
Yes, you will need to pass your own kernel manager (km). You can get one like this:
from jupyter_client.manager import KernelManager
km = KernelManager(kernel_name="python3")
km.start_kernel()
# pass km to nbclient
km.shutdown_kernel()
Right, but, as David points out, this won't get you to the remote server. You'd essentially be rewriting GatewayKernelManager (and GatewayKernelClient). Since you received a 403, you might need to explore adding the token to your headers via JUPYTER_GATEWAY_AUTH_TOKEN. I suspect there will be other issues as well - probably in the session management aspect of things.
@amit-chandak-unskript I'm working on a new project that will allow to do just what you want, see https://github.com/davidbrochart/jpterm/pull/2.