vscode-jupyter icon indicating copy to clipboard operation
vscode-jupyter copied to clipboard

Race condition in port forwarding on Remote-Container on WSL

Open ddrinka opened this issue 2 years ago • 17 comments

Applies To

  • [X] Notebooks (.ipynb files)
  • [ ] Interactive Window and/or Cell Scripts (.py files with #%% markers)

What happened?

I'm using Bokeh and Holoviews to render a chart that opens a connection back to a websocket.

A race condition exists between the backend Tornado server starting up and the port being forwarded, to making the first HTTP request. Re-running the cell repeatedly either results in a chart being displayed or no chart being displayed.

VS Code Version

1.71.1

Jupyter Extension Version

v2022.8.1002431955

Jupyter logs

Rather than pasting the Jupyter logs, here are the Developer Console javascript logs:

Startup

INFO [attempt 1] Invoking resolveAuthority(dev-container)
log.ts:301  INFO [attempt 1] resolveAuthority(dev-container) returned '127.0.0.1:35017' after 4479 ms
VM11:4 Registering custom require.js for Jupyter Kernel
eval @ VM11:4
notebookWebviewPreloads.js:3 Notebook preload (https://vscode-remote%2Bdev-002dcontainer-002b633a5c7372635c6a6f686e6e792d726f626f745c4572676f6e2e52657365617263685c656d725c646174615c6175676d656e745c4e42424f.vscode-resource.vscode-cdn.net/home/vscode/.vscode-server/extensions/ms-toolsai.jupyter-2022.8.1002431955/out/node_modules/%40vscode/jupyter-ipywidgets/dist/ipywidgets.js) looks like a module but does not export an activate function
r @ notebookWebviewPreloads.js:3
console.ts:137 [Extension Host] Starting WebSocket: RAW/api/kernels/f1213451-2417-49ab-89f2-71c059f9c919
DevTools failed to load source map: Could not load content for https://vscode-remote+dev-002dcontainer-002b633a5c7372635c6a6f686e6e792d726f626f745c4572676f6e2e52657365617263685c656d725c646174615c6175676d656e745c4e42424f.vscode-resource.vscode-cdn.net/home/vscode/.vscode-server/extensions/ms-toolsai.jupyter-2022.8.1002431955/out/webviews/webview-side/ipywidgetsKernel/ipywidgetsKernel.js.map: Connection error: net::ERR_NAME_NOT_RESOLVED

No chart displayed

VM22:343 [bokeh] setting log level to: 'info'
The FetchEvent for "http://localhost:36281/autoload.js?bokeh-autoload-element=1002&bokeh-absolute-url=http://localhost:36281&resources=none" resulted in a network error response: the promise was rejected.
Promise.then (async)
(anonymous) @ service-worker.js:213
service-worker.js:352          Uncaught (in promise) TypeError: Failed to fetch
    at l (service-worker.js:352:11)
l @ service-worker.js:352
VM26:12          GET http://localhost:36281/autoload.js?bokeh-autoload-element=1002&bokeh-absolute-url=http://localhost:36281&resources=none net::ERR_FAILED
(anonymous) @ VM26:12
(anonymous) @ VM26:13
domEval @ index.js:1304
renderHTML @ index.js:1317
renderOutputItem @ index.js:1447
render @ notebookWebviewPreloads.js:3
renderOutputCell @ notebookWebviewPreloads.js:3
await in renderOutputCell (async)
(anonymous) @ notebookWebviewPreloads.js:3
te.outputs.set.queue @ notebookWebviewPreloads.js:3
enqueue @ notebookWebviewPreloads.js:3
(anonymous) @ notebookWebviewPreloads.js:3
postMessage (async)
(anonymous) @ index.html?id=d051151b-29f8-40da-ba40-2da365b2934e&origin=d051151b-29f8-40da-ba40-2da365b2934e&swVersion=4&extensionId=&platform=electron&vscode-resource-base-authority=vscode-resource.vscode-cdn.net&parentOrigin=vscode-file%3A%2F%2Fvscode-app&remoteAuthority=dev-container%2B633a5c7372635c6a6f686e6e792d726f626f745c4572676f6e2e52657365617263685c656d725c646174615c6175676d656e745c4e42424f&purpose=notebookRenderer:1102
HostMessaging.channel.port1.onmessage @ index.html?id=d051151b-29f8-40da-ba40-2da365b2934e&origin=d051151b-29f8-40da-ba40-2da365b2934e&swVersion=4&extensionId=&platform=electron&vscode-resource-base-authority=vscode-resource.vscode-cdn.net&parentOrigin=vscode-file%3A%2F%2Fvscode-app&remoteAuthority=dev-container%2B633a5c7372635c6a6f686e6e792d726f626f745c4572676f6e2e52657365617263685c656d725c646174615c6175676d656e745c4e42424f&purpose=notebookRenderer:295

Chart displays and is interactive

VM22:343 [bokeh] setting log level to: 'info'
VM22:746 [bokeh] Websocket connection 0 is now open
VM22:324 [bokeh] document idle at 69 ms
VM22:322 Bokeh items were rendered successfully

Coding Language and Runtime Version

Python v3.9.13, holoviews 1.15.0, hvplot 0.8.1, bokeh 2.4.3

Language Extension Version (if applicable)

No response

Anaconda Version (if applicable)

No response

Running Jupyter locally or remotely?

Remote

ddrinka avatar Sep 14 '22 18:09 ddrinka

I'm using the simple repro code from https://github.com/microsoft/vscode-jupyter/issues/1714 to test:

import xarray as xr
import hvplot.xarray
import numpy as np

arr = xr.DataArray(
    np.random.random((2, 3, 4)), 
    dims=['x', 'y', 'time'],
    coords={'x': np.arange(2), 'y': np.arange(3), 'time': np.arange(4)}
)

test = arr.hvplot(x='x', y = 'y')

Re-run this until it works

import holoviews as hv
renderer = hv.renderer('bokeh')
renderer.app(test, show=True)

Note I also had to set the environment variable: BOKEH_ALLOW_WS_ORIGIN=* to avoid https://github.com/bokeh/bokeh/issues/10765 and https://github.com/microsoft/vscode-jupyter/issues/4132

ddrinka avatar Sep 14 '22 18:09 ddrinka

Thanks for filing this issue, I'll discuss this with the team and look into this.

DonJayamanne avatar Sep 15 '22 00:09 DonJayamanne

At first when I began testing today I wasn't able to reproduce the issue. I was able to re-run the renderer.app cell repeatedly and always got a chart and no Javascript errors.

Then I switched to the Jupyter Output Window (I had been in a Terminal Window), and the error now occurs every time I run the cell. No chart and net::ERR_FAILED errors.

image

ddrinka avatar Sep 15 '22 17:09 ddrinka

Trying to replicate that behavior: starting VS Code with the Terminal visible, and running the cells described above, did not yield what I'd hoped. At this time nothing I do results in a successful chart being displayed. I receive a net::ERR_FAILED every time I run the second cell.

ddrinka avatar Sep 15 '22 17:09 ddrinka

A race condition exists between the backend Tornado server starting up and the port being forwarded, to making the first HTTP request. Re-running the cell repeatedly either results in a chart being displayed or no chart being displayed.

Do I understand correctly -

  1. A Tornado server is set up somewhere on the remote
  2. Automatic port forwarding kicks in to forward that port
  3. The forwarded port is accessed from the notebook renderer

But the race is in forwarding that port before the renderer needs it? Is that what you think is happening @DonJayamanne?

If you can set a static port, you could probably forward it with forwardPorts in your devcontainer.json and that could help with the race.

roblourens avatar Sep 16 '22 14:09 roblourens

I'll test with a static port.

Notably I'm not getting a connection timeout error as that port comes online. I'm getting this net::ERR_FAILED in the service worker. I don't have an understanding of how the magic works with the port forwarding, so maybe that's the error that would be expected if no port was open, but usually for an HTTP connection I'd expect some timeout time that would give that port a few hundred milliseconds to arrive.

ddrinka avatar Sep 16 '22 16:09 ddrinka

Today I'm not able to replicate the race condition, so I can't tell if the static port helps... It's not a great long term solution though because there may be many apps created in a notebook and managing the port numbers, shutting down the existing applications, etc would really limit the notebook experience.

The requirement to create an app and bind it to a port in the first place is a workaround for the inability of Bokeh/Holoviews to communicate back through the IPython Proxy or whatever in the first place. The following should "just work", and does on a Jupyter instance running directly in the VS Code Terminal with no additional ports forwarded: image This renders but the interactivity doesn't work in VS Code.

Here's the channel that was created on Jupyter Lab: image

It seems like there are a few outstanding issues logged for Bokeh and VS Code. I can create a new one for the above behavior if that would be helpful.

ddrinka avatar Sep 16 '22 16:09 ddrinka

But the race is in forwarding that port before the renderer needs it? Is that what you think is happening @DonJayamanne?

Yes.

DonJayamanne avatar Sep 20 '22 01:09 DonJayamanne

The following should "just work", and does on a Jupyter instance running directly in the VS Code Terminal with no additional ports forwarded:

@ddrinka I'm assuming you are still running all of this in WSL. Is that right?

cate the race condition, so I can't tell if the static port helps... It's not a great long term solution

Understood.

@alexr00 Any idea what we can do here. Basically there seems to be a race condition here. The port forwarding seems to happen after the webview attempts to access the port.

DonJayamanne avatar Sep 20 '22 01:09 DonJayamanne

If a webview needs the port then the owner of the webview should call asExternalUri to cause it to be forwarded. Automatic port forwarding is a user facing feature and works by polling, with a potentially non-constant polling frequency depending on the speed of the machine. Because of this, it's not safe to rely on automatic port forwarding for programmatic access to forwarded ports.

alexr00 avatar Sep 20 '22 11:09 alexr00

@DonJayamanne yes, this is all WSL.

@alexr00 understood. That polling interval sounds exactly like the cause of the trouble.

I'm out of my depth with all these internals but I'm happy to open an issue with Bokeh regarding their webview utilization.

While the experts are looking at this, and if you'll forgive my hijacking, can you provide any advice for Bokeh to solve the proxy communication issue that leads to this requirement for opening additional ports in the first place?

https://github.com/bokeh/bokeh/issues/10765 https://github.com/microsoft/vscode-jupyter/issues/4132

It sounds like work was done for iPyWidgets that would have to be done for Bokeh as well to make it work? https://github.com/microsoft/vscode-jupyter/wiki/Component:-IPyWidgets

ddrinka avatar Sep 20 '22 22:09 ddrinka

With IPyWidgets, there are no custom ports open, IPyWidgets communicate over the Jupyter protocol. I'll dig through the bokeh code and get in touch with their maintainers. My prelimnary suggestion would be to use Jupyter protocol as thats more resilient, else anyone dealing with Jupyter remotely could have similar issues with firewall restrictions and the like.

DonJayamanne avatar Sep 20 '22 23:09 DonJayamanne

By default I believe Bokeh does use the Jupyter protocol in a way similar to IPyWidgets. This ticket exists due to my attempt to work around the inability of Bokeh to succeed in its normal communication style by forcing open a new Tornado backend server and using that instead. Which almost works. :p

ddrinka avatar Sep 20 '22 23:09 ddrinka

is ticket exists due to my attempt to work around the inability of Bokeh to succeed in its normal communication style by forcin

Could you provide more information about this, is there an issue on the Boken repo for this (the problem you're trying to work around)? I ask because fixing that root problem would then alleviate this issue.

DonJayamanne avatar Sep 21 '22 00:09 DonJayamanne

Those tickets I linked above,

https://github.com/bokeh/bokeh/issues/10765 https://github.com/microsoft/vscode-jupyter/issues/4132

seem to track the base issue.

ddrinka avatar Sep 21 '22 00:09 ddrinka

My prelimnary suggestion would be to use Jupyter protocol as thats more resilient, else anyone dealing with Jupyter remotely could have similar issues with firewall restrictions and the like.

Just for some context, Bokeh (even the current Bokeh server, which came later) well predates JupyterLab and IIRC even predates notebook comms. But more importantly, most usage of Bokeh server is simply not in notebook/jupyter environments at all. So having Jupyter comms be the only, or even the default, transport, is a non-starter. I do very much think that liberating the Bokeh protocol from a particular transport, so that bi-directional Bokeh eventing could easily happen over websocket, or jupyter comms, or whatever else anyone might like, would be fantastic to have happen. But Bokeh is no longer something I get paid to work on, and this is not really an in-your-free-time chunk of work. I am not sure when it might happen without a proper dedication of resources.

bryevdv avatar Sep 21 '22 05:09 bryevdv

I've done enough issue-hijacking. I'll jump over to Bokeh's issue tracking for further conversation about Bokeh and VSCode using the default transport, and leave this issue to track any additional thoughts on the race condition when Bokeh is running in Server mode.

Thanks for your input @bryevdv, appreciate the response here.

ddrinka avatar Sep 22 '22 17:09 ddrinka

Summary for internal (personal notes):

  • Not much can be done from VS Code to resolve this issue
  • If Bokeh were to use IPYwidgets and kernel Comms messages instead of a custom websocket connection that would address this issue.
  • See here for context of the current approach https://github.com/microsoft/vscode-jupyter/issues/11368#issuecomment-1253222327

I'm going to close this as something that cannot be fixed in VS Code as this is specific to bokeh package. However if ther'es something we (Jupyter extension or VS Code) can do to resolve this, please do comment here or create a new issue

Closing this for now.

DonJayamanne avatar Sep 25 '22 23:09 DonJayamanne