jupyter_server
jupyter_server copied to clipboard
Resolve server absolute path to server or kernel-relative path
Motivation
It would improve productivity if Jupyter (Lab/Notebook) allowed to click on file path in tracebacks (and elsewhere) to open the file (https://github.com/jupyterlab/jupyterlab/issues/13277). The logic would be as follows:
- if path points to a file withing
root_dir, the file should be opened on the frontend for edition - if path points to a file beyond
root_dirwe should either:- a) do nothing in security sensitive setups
- b) ask kernel to provide source of such file and display it as read-only - this is already implemented in ipykernel using debugger adapter protocol
sourcerequest (this would be necessary for remote kernels) - c) have a custom server extension which would implement ContentsManager API allowing exposing specific files outside of
root_dirbased on block/allow list (see Additional scope for broader filesystem access below; this would not work for remote kernels)
Problem
It is currently impossible to distinguish between 1 and 2 (whether we are within root_dir or outside of it).
For server started in root_dir = "~/server_root", we can expect the following traceback from ipykernel:
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 1
----> 1 from a_file import test
File ~/server_root/a_file.py:1
----> 1 test
NameError: name 'test' is not defined
The problem is that frontend cannot tell whether ~/server_root/a_file.py is within root_dir or not.
This is the case even if frontend knows what the root_dir is. For example if root_dir is /home/my-username/server_root, the frontend does not know what is the expansion of ~ in the kernel space (it may well be /home/another-username/).
"Guessing" by trying both is not an option because we want to avoid false positives (turning file-like strings into paths that are broken URLs - mostly because everything can look like a path) and there are performance implications if we were guessing that way.
Proposed Solution
Create a new API endpoint which would tell the frontend whether the given file path is within the scope of the server, kernel, or neither. If the file is within the scope of the server, it would return the normalised path relative to root dir.
This could account for kernels which are spawned in a filesystem different from where the root_dir resides - as far as I understand there are no restrictions on kernel location (see snippet below) - a path could be within scope of both kernel and server (when kernel is started within root_dir), only one of them, or neither.
https://github.com/jupyter-server/jupyter_server/blob/09c15ce4ffa9b1c5b54c376ff1601475150554fb/jupyter_server/services/kernels/kernelmanager.py#L195
Examples
For simplicity, let's call the proposed endpoint /api/resolve (although maybe it should be integrated with existing file ID manager, in which case it could be /api/fileid/resolve). In pseudocode it would be described as:
class PathResolver(Protocol):
def resolve_path(self, path: str) -> str: ...
class ContentsManager(..., PathResolver): ...
class KernelManager(..., PathResolver): ...
def handle_resolve(self, path: str, kernel_uuid: str):
scopes = [
self.contents_manger,
self.multi_kernel_manager.get_kernel(kernel_uuid),
*self.get_additional_scopes(kernel_uuid)
]
return [
scope.resolve_path(path)
for scope in scopes
if hasattr(scope, 'resolve_path')
]
For a server spawned at ~/server_root with a kernel spawned in the same location:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]
For a server spawned at ~/server_root with a kernel spawned in ~/server_root/kernel:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}]
# /api/resolve?path=~/server_root/kernel/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'kernel/test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]
For a server spawned at ~/server_root with a kernel spawned in /tmp/kernel:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}]
# /api/resolve?path=/tmp/kernel/test.py&kernel={uuid}
[{'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/&kernel={uuid}
[]
I am not opinionated on any particular JSON format, but I think it would be useful to return all matching resolutions and allow the frontend client to decide which one to use.
Additional context
Additional scope for exposing source access
As noted in (2b) we could expose the source of files known by kernel (which is beyond its spawn cwd) reusing existing DAP source request. The /api/resolve response could advertise that a path is known by the kernel's source handler. Augmenting the first example:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}, {'scope': 'source', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}, {'scope': 'source', 'relative': 'test.py'}]
# /api/resolve?path=~/test.py&kernel={uuid}
[{'scope': 'source', 'relative': `/home/user/test.py`}]
# /api/resolve?path=/lib/python/library/test.py&kernel={uuid}
[{'scope': 'source', 'relative': `/lib/python/library/test.py`}]
Additional scope for broader filesystem access
Per (2c) it would be desirable to enable implementation of custom scope provider that would allow tightly controlled access to filesystem beyond root_dir. This would benefit other uses where access to files on filesystem is desirable (https://github.com/jupyter-lsp/jupyterlab-lsp/issues/850).
A scope provider configured to expose files under ~/shared with server (as in first example) spawned at ~/server_root and kernel spawned in the same location would resolve the following:
# /api/resolve?path=~/server_root&kernel={uuid}
[{'scope': 'server', 'relative': '.'}, {'scope': 'kernel', 'relative': '.'}]
# /api/resolve?path=~/server_root/test.py&kernel={uuid}
[{'scope': 'server', 'relative': 'test.py'}, {'scope': 'kernel', 'relative': 'test.py'}]
# /api/resolve?path=~/shared/test.py&kernel={uuid}
[{'scope': 'filesystem', 'relative': `~/shared/test.py`}] # filesystem is relative to filesystem root (de facto absolute)
# /api/resolve?path=~/not-allowed/test.py&kernel={uuid}
[]
The difference between filesystem and source scope is subtle but noticeable when:
- the kernel is running in different separate filesystem than server
- there are multiple contents managers
Impact on multiplexed content managers
A number of ways to provide multiple content managers was proposed over the years:
- Jupyter(Lab)
IDrivefrontend API which may be connected to alternative/api/conentsendpoint - jpmorganchase/jupyter-fs using
MetaManagerwhere drives are managed on the server side rather than frontend - viaduct-ai/hybridcontents - status not clear
- jupyter/jupyter-drive (
MixedContentsManager) - deprecated
With the proposed solution:
- the
IDrivewould need to amended to allow providing an URL for alternative/api/resolve. - the drive-aware meta-managers like
jupyter-fsshould be able to handle for/api/resolveby overriding implementation ofContentsManager.resolve_pathto account for drive prefixes.
C-f https://github.com/jupyter/notebook/issues/3233
Impact on security by obscurity
The proposed solution would make it easier to find out root_dir from the frontend because a user could check numerous paths and deduce root_dir path from the server responses by brute-force. This is not a concern for majority of administrators as kernels are typically run locally hence not only know but also have access to full runtime path.