jupyter_ydoc
jupyter_ydoc copied to clipboard
Add `pending_requests`
A part of a proposed solution to https://github.com/jupyter-server/jupyter_server/issues/990
Supersedes #197
When user executes a long-running cell twice it could be:
{
"pending_requests": [
{
# msg_id of an `shell.execute_request` (allowing to identify subsequent "iopub.status" by parent_header
"id": "3ec6186d-dbd2-486e-9622-e771b4bdcc21",
"type": "execute_request"
},
{
# if user scheduled two executions of the same cell we will need to wait until both are cleared
"id": "0acd4f22-e9cc-4dd1-969d-8aec793d987d",
"type": "execute_request"
}
]
}
When user runs a cell with input()
it could be:
{
"pending_requests": [
{
# the request for execution of a cell with `input()`
"id": "6fd7fbd4-1c1d-4824-9c96-a57972b4be62",
"type": "execute_request"
},
{
# the response from the server
"id": "0acd4f22-e9cc-4dd1-969d-8aec793d987d",
"type": "input_request"
}
]
}
Here is how an execution request and a subsequent iopub.status
response looks on the wire:
And an execution followed by input_request
- note no status reply yet because the input_request
is pending:
Here are some thoughts:
- do we want to prefix these properties with
msg_
to align withmsg_id
andmsg_type
? - do we want to also store
session
ID from the header? In practice incoming messages would need to be compared againstpending_requests
and ifmsg_id
is only unique within a given session then maybe we should store thesession
too
I'm not sure we should go down the road of allowing concurrent cell execution. Aren't executions queued anyway?
This is not about concurrent executions but queuing. We need to record the queue to resolve the status properly.
In #197 there is an execution_state
field which is equivalent to the status, why isn't that enough?
To illustrate if I have a cell:
from time import sleep
sleep(60)
I execute it first time, the pending_requests
would change to
[
{
"id": "3ec6186d-dbd2-486e-9622-e771b4bdcc21",
"type": "execute_request"
}
]
then while it is still running, say 5 seconds in, I execute it again; there are now two pending requests:
[
{
"id": "3ec6186d-dbd2-486e-9622-e771b4bdcc21",
"type": "execute_request"
},
{
"id": "0acd4f22-e9cc-4dd1-969d-8aec793d987d",
"type": "execute_request"
}
]
after another 55 seconds the first cell would finish and emit "idle" status; however, the cell should still display the "busy" status as there will be another 60 seconds until the second execution request completes and in the meantime the pending_requests
will be:
[
{
"id": "0acd4f22-e9cc-4dd1-969d-8aec793d987d",
"type": "execute_request"
}
]
But if we don't process new execution requests until the cell's execution state is idle, we don't need to record pending executions?
In https://github.com/jupyter-server/jupyter_ydoc/pull/197 there is an execution_state field which is equivalent to the status, why isn't that enough?
Good question. There are at least two use cases which cannot be resolved with simple execution_state
field alone, unless we move all execution logic from the frontend to the server:
- recording timing
- stdin input boxes
Currently, frontend records cell execution timing to cell metadata. If we only keep "status" flag there is no way for frontend to pick up the new message after reloading. Say I execute a cell twice with sleep(60)
and then reload the page in 30th second. If all I get is "busy" status from the model, then when the "iopub.status" message with "idle" comes at 60th second I cannot reconcile it with the cell metadata. I can still stop the timer at 120th second because the model will then switch to "idle", but the timer will show execution time of 120 seconds (rather than 60 seconds).
Similarly for input boxes, the frontend needs to know if the input request is still pending. If it does not know, it can get into a kernel deadlock state.
Now, the current proposal is not ideal for the timing scenario either because if I run a cell with sleep(60)
and close the browser tab, and then reopen after 70 seconds, then the frontend cannot set the elapsed time properly - all that it can infer is that the cell is no longer running (because pending_requests
would be empty).
So maybe we should populate the cell execution timing metadata on the server. For reference, the timing code is here in JupyterLab. However, note, this is not only about setting the right timings, but also about notifying the extensions via a signal. Maybe it would be fine to populate metadata on server and in the scenario described earlier (a single cell executed twice with browser tab refreshed in the middle of the first execution) the frontend would watch to changes to that metadata field, and emit the signal respectively.
But if we don't process new execution requests until the cell's execution state is idle, we don't need to record pending executions?
See above, but also I don't think we have any control on when the kernel processes the execution requests. We could do as you say (don't process next until previous returned idle), but I think that the messaging protocol says that queuing is handled by the kernel and it is its implementation detail; changing it to what you propose should not be a problem unless there is a substantial latency between server and kernel so maybe for remote kernels.
I think this underlines the need to move to server-side execution. With these changes, I feel that the kernel protocol is leaking into the shared model, but with server-side execution the kernel protocol is exactly what we want to get rid of (in frontends). I agree that timings should be recorded server-side. For input requests, my vision is that they should be collaborative anyway, maybe using ypywidgets. That way, every client will see the input box with live changes from other peers. I agree that what I'm proposing needs a lot of changes in different parts, but I think this should be the direction. And I fear that if we don't take that direction we are going to end up in a mixed state where the CRDT-based architecture will be "polluted" by low-level details of the non-CRDT system that we are precisely trying to get rid of.
For input requests, my vision is that they should be collaborative anyway, maybe using ypywidgets. That way, every client will see the input box with live changes from other peers.
I can see that for plain text input, but password boxes would need special treatment (from getpass import getpass; getpass()
)?
We could imagine a special widget that would not show the password, but would still allow any client to enter it?