jupyter_client icon indicating copy to clipboard operation
jupyter_client copied to clipboard

Lay foundation to pass notebook names to kernel at startup.

Open Carreau opened this issue 4 years ago • 17 comments

This has been a controversial topic from some time:

https://github.com/jupyter/notebook/issues/1000 https://forums.databricks.com/questions/21390/is-there-any-way-to-get-the-current-notebook-name.html https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name https://ask.sagemath.org/question/36873/access-notebook-filename-from-jupyter-with-sagemath-kernel/

This is also sometime critical to linter, and tab completion to know current name.

Of course current answer is that the question is ill-defined, there might not be a file associated with the current kernel, there might be multiple files, files might not be on the same system, it could change through the execution and many other gotchas.

This suggest to add an JPY_ASSOCIATED_FILE env variable which is not too visible, but give an escape hatch which should mostly be correct unless the notebook is renamed or kernel attached to a new one.

Do do so this handles the new associated_file parameters in a few function of the kernel manager. On jupyter_server this one line change make the notebook name available using typical local installs:

--- a/jupyter_server/services/sessions/sessionmanager.py
+++ b/jupyter_server/services/sessions/sessionmanager.py
@@ -96,7 +96,12 @@ class SessionManager(LoggingConfigurable):
         """Start a new kernel for a given session."""
         # allow contents manager to specify kernels cwd
         kernel_path = self.contents_manager.get_kernel_path(path=path)
-        kernel_id = await self.kernel_manager.start_kernel(path=kernel_path, kernel_name=kernel_name)
+
+        kernel_id = await self.kernel_manager.start_kernel(
+            path=kernel_path, kernel_name=kernel_name, associated_file=name
+        )
         return kernel_id

Of course only launchers that will pass forward this value will allow the env variable to be set.

I'm thinking that various kernels may use this and expose it in different ways. like notebook_name if it ends with .ipynb in ipykernel.

Carreau avatar Jun 07 '21 14:06 Carreau

@goanpeca pointed out to me that it could be a good idea to include the associated file in each execute request metadata in order to take into account multiple connected clients and renaming, though that's a more invasive change.

Carreau avatar Jun 07 '21 15:06 Carreau

As you said, it only makes sense in specific situations, the most obvious one being when the kernel is used by a single notebook. But in this case, it looks to me like it would be better handled by e.g. the Jupyter Server, which is well aware of the notebook's name and its associated kernel. It could execute some code in the kernel at startup and each time the notebook is renamed, something like:

__notebook_name__ = "my_notebook.ipynb"

davidbrochart avatar Jun 07 '21 17:06 davidbrochart

Note that there is some notion of "temporary" filename for the debug protocol.

SylvainCorlay avatar Jun 10 '21 09:06 SylvainCorlay

This sounds like another item for the kernel parameterization cart, although one that is more of a system-owned parameter.

I agree that using an environment variable is the way to go, but I'd like to better understand the use-case here. The referenced notebook issue talks of a high availability scenario, while others talk about creating files with differing suffixes, where the notebook file's basename is the common association.

  • What kind of use-case is this driving at?
  • Can the "association" be something that is pure metadata that survives rename operations, etc. (as suggested here)? A GUID isn't user-friendly and if there needs to be a "visual association", not appropriate.
  • What happens to this use case when the associated file is renamed or multiple files are associated? Will it lead to errors for faulty behavior?

The fact that this value can change after a kernel has started is troubling. At best, this is merely a hint, so if this were to be supported, I would recommend this environment variable be named to convey that aspect: JPY_ASSOCIATED_FILE_HINT or JPY_FILE_HINT or JPY_INITIAL_FILE.

I like the idea of providing system-owned environment variables and the JPY_ prefix seems good (and has precedence). (For reference, Enterprise Gateway flows any env prefixed with KERNEL_ to the kernel.)

If we were to proceed with this, I would suggest introducing the environment variable in the session manager directly and utilizing the existing env keyword argument (dict). This should flow through the kernel management layer as-is. Applications relying on core projects like nbclient could then benefit from this (assuming nbclient is updated to set the env as well).

kevin-bates avatar Jun 10 '21 15:06 kevin-bates

As you said, it only makes sense in specific situations, the most obvious one being when the kernel is used by a single notebook. But in this case, it looks to me like it would be better handled by e.g. the Jupyter Server, which is well aware of the notebook's name and its associated kernel. It could execute some code in the kernel at startup and each time the notebook is renamed, something like:

__notebook_name__ = "my_notebook.ipynb"

While that is true for the Python kernel this is not a cross language solution, the advantage of using an env variable is that it's kernel agnostic and does not risk breaking anything,

I agree that __notebook_name__, would be great but imho this should be handled at the protocol level, for example in metadata in the execute request and let the kernel do "the right things" depending on the language.

This sounds like another item for the kernel parameterization cart, although one that is more of a system-owned parameter.

I agree that using an environment variable is the way to go, but I'd like to better understand the use-case here. The referenced notebook issue talks of a high availability scenario, while others talk about creating files with differing suffixes, where the notebook file's basename is the common association. ...

I'm not sure this is parameterization, but I see where you are coming from. I think that having this as an env variable is a cheap way to at least get started. Having the name can be used to better reference error messages (instead of having <IPython-input-12> or similar in traceback. I'm not in generala huge fan of it, but as you saw in above issues these is a growing number of people asking for it, and workaround

The fact that this value can change after a kernel has started is troubling. At best, this is merely a hint, so if this were to be supported, I would recommend this environment variable be named to convey that aspect: JPY_ASSOCIATED_FILE_HINT or JPY_FILE_HINT or JPY_INITIAL_FILE.

I like the idea of providing system-owned environment variables and the JPY_ prefix seems good (and has precedence). (For reference, Enterprise Gateway flows any env prefixed with KERNEL_ to the kernel.)

I don't have particular feeling for how to name the env variable, HINT in fine, JPY_ prefix sounds great. I want to avoid NOTEBOOK as it's increasingly note a notebook.

My goal here is to have a stopgap – the env variable will in most of the case be the right value, if we call it INITIAL as you pointed that's enough to say that it might not be correct. But if we get that in it works for all the kernel, and then we can discuss what to add in execute request metadata , for example file type (notebook, plain, other), and the filename or full path.

Those metadata can be interpreted by each kernel to provide language specific way to access those values, maybe __notebook_name__, or __notebook_path__ either as str, or objects... and then there is no issues to make it change at each execution requests, or have those objects have events if the notebook name changes. I want to also keep in mind that resources might not be on a filesystem so I would be fine with this value being s3://...., or anything else that the kernel can understand.

if that can be harmonized with the debugger I'm all for it.

Now I believe the first two questions we want to address are:

  • Do we agree that accessing in some way the current file path/name/url/uuid is something we want to allow ?
  • if so: Do we want to start working on a complete solution right now which is correct in all the cases or do it in steps with maybe something which is correct in most of the cases and then refine ?

Carreau avatar Jun 15 '21 16:06 Carreau

Do we agree that accessing in some way the current file path/name/url/uuid is something we want to allow ?

I think it's asked for often enough that having a path for this that's reasonably safe guarded makes sense.

if so: Do we want to start working on a complete solution right now which is correct in all the cases or do it in steps with maybe something which is correct in most of the cases and then refine ?

I think if the solution is scoped to "When executing in situation A, then B will be available to the runtime" is sufficient. Solving perfectly for all cases is going to a major headache with little upside. Good documentation and naming convention here would probably make it clear around scope of use imho.

I don't have particular feeling for how to name the env variable, HINT in fine, JPY_ prefix sounds great. I want to avoid NOTEBOOK as it's increasingly note a notebook.

What about instead talking about it as kernel session naming? When you launch a kernel the kernel can accept a session name that's then set to an env variable (or whatever is decided). e.g. KERNEL_SESSION_NAME="my_notebook.ipynb", or or any string including hashes / UUID strings. This would leave the solution somewhat generic but we could then have the jupyter server state that whenever it launches a notebook file it follows a specific convention for setting the session name. e.g. server always sets it to the initial notebook file name, but doesn't change that name after kernel launch. Unless I am missing some context of the problem I believe this would also get around the conversation about every client execute request sending a file name. The field isn't a client field, it's a launcher field for naming the overall repl session for it's given lifecycle independent of users of that session.

MSeal avatar Jun 16 '21 06:06 MSeal

I think couching this in terms of a session name is a great idea - thanks @MSeal. Such an approach removes all sorts of implications that other names might suggest.

For my own understanding, is this value just the notebook filename or an actual path? This comment seems to imply that, at least initially, the kernel can expect the value of this env to be a tangible resource (and not just a contextual string).

I want to also keep in mind that resources might not be on a filesystem so I would be fine with this value being s3://...., or anything else that the kernel can understand.

kevin-bates avatar Jun 16 '21 16:06 kevin-bates

My reasoning for a path instead of a filename, is that sufficiently enough people cd around in their notebook that the filename might not be sufficient. But I agree that if we go with a session name this ambiguity is resolved as we don't guarantee it's the curent file.

Carreau avatar Jun 22 '21 10:06 Carreau

As a note, one of the use case this was privately requested to me was reporting which file are containing deprecated functions when using kernels scheduled on remote machines. There are hooks installed on a distributed system that log when deprecated functions are used and send user regular reports; problem is that the kernels do not know the file names and the logs are a bit useless as the only thing thay can mention is <IPython-input-XXXX>

Carreau avatar Jun 23 '21 15:06 Carreau

Do we want to target this for 7.0?

blink1073 avatar Jul 30 '21 09:07 blink1073

If I'm understanding things correctly, and assuming we went with an env similar to KERNEL_SESSION_NAME, I think this boils down to the various applications setting that value into the env keyword argument of their corresponding "start kernel" request and it should just flow. That is, I don't think there'd be any work here, and if there was, I suspect it nothing API-breaking from a release versioning perspective.

kevin-bates avatar Jul 30 '21 15:07 kevin-bates

I've reworked this on top of the 7.x branch.

I'm a bit worried of a couple of things.

  1. you need to thread the session name down to the providers via kwargs.
  2. the current code is not super clear as to which kw goes where, and seem to rely on kwargs mutability being visible to caller.

I've sent https://github.com/jupyter/notebook/pull/6180 to show how this can be used.

it's also unclear where the name of the env variable should be documented. I can document in it in Jupyter_client, though I guess this is more interesting for end-users so maybe somewhere else ?

Carreau avatar Sep 14 '21 21:09 Carreau

I agree that plumbing this as a specific (and new) keyword argument is painful. I guess I figured this would simply be incorporated in the existing env:dict keyword argument on the start_kernel request. I believe that would just flow. Has that approach been attempted?

it's also unclear where the name of the env variable should be documented. I can document in it in Jupyter_client, though I guess this is more interesting for end-users so maybe somewhere else ?

I think jupyter_client is the correct location for this. When/if we ever parameterize kernel launches, this would/should be viewed as a system parameter/property, and those should probably be documented at the lowest level (IMHO).

kevin-bates avatar Sep 15 '21 00:09 kevin-bates

From my experience:

  • Common usecases are management of resources associated with notebook / auto-saving artifacts like plots in "right place" / sending notifications / saving current notebook or stripped version of current notebook.
  • Case which I am currently exploring is custom pycharm-like remote kernels, which assosicated with notebook name. So even if laptop was rebooted, pycharm still can connect to the right kernel for the notebook on remote server. This requires kernel to know name of a notebook.

Already-proposed-and-discussed implementation with just passing properly-named envvar (containing notebook path, not name) suffice for above usecases.

I'd be very supportive in getting it incorporated.

arogozhnikov avatar May 03 '22 19:05 arogozhnikov

@Carreau @kevin-bates do you think you converged on how this should work or are there any alternatives you still consider?

arogozhnikov avatar May 08 '22 03:05 arogozhnikov

@arogozhnikov - thanks for the ping.

Reading back through this, I think we're good with using an env named JPY_SESSION_NAME (although if not KERNEL_SESSION_NAME, the gateway integration will need to be adjusted to flow JPY_-prefixed variables as well).

The one thing I'd rather see is to introduce the env kwarg when starting the kernel relative to the session. This way, nothing in jupyter_client needs to change and it's purely a jupyter_server play.

kevin-bates avatar May 09 '22 15:05 kevin-bates

ok, @Carreau

  1. Do you preferJPY_SESSION_NAME or KERNEL_SESSION_NAME ?
  2. should it belong here or jupyter-server as Kevin suggested in previous comment?

arogozhnikov avatar May 11 '22 07:05 arogozhnikov

As this is still open I'll ask the question: why I do see something like:

JPY_SESSION_NAME=git/notebooks/1b7dcc20-a919-1d16-9efe-13f1d420ee93

with the uuid rather than actual file name of the notebook?

I was going to use the current notebook name as a way to add it to artifacts that are preserve along with other outputs but without the file name it's of no use and there seems to be no other sane, portable way to do so (https://stackoverflow.com/questions/12544056/how-do-i-get-the-current-ipython-jupyter-notebook-name) :(

p.s. I'm using Jupyter notebooks under JupyterLab.

emsi avatar Nov 22 '22 14:11 emsi

Hi @emsi. Yeah, this was realized the other day as well and an issue opened here: https://github.com/jupyter-server/jupyter_server/issues/1059. I would suggest we keep the discussion there.

Seems like this issue should be closed since the desired behavior was implemented (in jupyter_server), just that something caused the desired behavior to be changed. Closing.

kevin-bates avatar Nov 22 '22 16:11 kevin-bates