dask icon indicating copy to clipboard operation
dask copied to clipboard

Proxying Dask (Bokeh) Web Interface on AWS SageMaker

Open davidtwomey opened this issue 6 years ago • 27 comments
trafficstars

Hi, I am using Amazon SageMaker Instances to run self-contained JupyterLab environments. I can run dask no problem, but it would be great to view the bokeh dashboard in my browser. I have so far looked into two approaches (see below) but would welcome any solutions/ideas anyone has:

1. Port Forwarding: (as suggested here)

Unfortunately, (to the best of my knowledge), AWS Sagemaker Notebooks do not allow SSH access, and hence Port Forwarding is not an option

NOTE: Google AI Notebook Instances do however and so this is a suitable solution for GCP <VM-IP-ADDRESS>:8787/status

2. Using a jupyter proxy extension google-server-proxy

This is close to a working solution, as I am able to view the dashboard, but am getting a websocket error When I attempt to access the dashboard using the proxy bokeh.protocol.exceptions.ProtocolError: No bokeh-protocol-version specified image The exception raised from bokeh image

Steps to reproduce

  • Create a new notebook instance SageMaker
  • Install Dask + Bokeh in a py3.6 kernel
  • Install jupyter-server-proxy extension
git clone --depth 1 https://github.com/jupyterhub/jupyter-server-proxy
cd jupyter-server-proxy/jupyterlab-server-proxy
npm install && npm run build && jupyter labextension link .
npm run build && jupyter lab build
  • Create a new notebook and create a client
from dask.distributed import Client
client = Client()
  • Navigate to the proxied web-proxy:
# In web-browser
https://<SAGEMAKER-ENDPOINT>/proxy/8787/status

(Additional info on my runtime env) image

Additional Comment I am a big advocate of dask and very much appreciate the ongoing work everyone is doing!

I understand this may not be the appropriate repo for this issue/feature-request and may not appear of high priority. However, I feel SageMaker Notebooks/GCP AI Notebook environments will be an increasing use case for lots of ML researchers and developers and, consequently, a well supported solution to this would be a useful addition to the docs.

Thanks,

David

davidtwomey avatar Sep 25 '19 12:09 davidtwomey

Thanks for laying all of this out @davidtwomey .

I understand this may not be the appropriate repo for this issue/feature-request and may not appear of high priority

My first question is actually whether or not the AWS Sagemaker folks can help with this problem.

@wleepang do you have any contacts that would be useful here?

mrocklin avatar Sep 25 '19 12:09 mrocklin

@mrocklin Sorry, just seeing this. I can ask around if this is still an issue.

wleepang avatar Feb 23 '20 16:02 wleepang

This is still an issue for me. I'm hitting the exact same error (No bokeh-protocol specified).

I've tried with my SM notebook instance being in a VPC / no VPC. Neither works.

Direct Internet is enabled on the SM notebook instance.

jennakwon06 avatar Feb 23 '20 17:02 jennakwon06

I am also hitting this exact same issue.

nima-akram avatar Mar 19 '20 02:03 nima-akram

I'm reproducing this now while investigating dask/dask-labextension#87.

It does seem that Sagemaker is dropping the websocket connection

image

Pinging @wleepang again to see if this is something you could help with?

jacobtomlinson avatar May 20 '20 14:05 jacobtomlinson

Proxying the dashboard out to the internet with serveo shows the dashboard is working correctly.

image

Therefore the websocket must be being dropped somewhere, either by the nbserverproxy or whatever proxy Sagemaker uses to expose Jupyter.

jacobtomlinson avatar May 20 '20 14:05 jacobtomlinson

I've managed to get this far with Sagemaker: image

The errors don't seem to be specific to either dask or bokeh:

tornado.application - ERROR - Uncaught exception in /status/ws
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/websocket.py", line 498, in _run_callback
    result = callback(*args, **kwargs)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/bokeh/server/views/ws.py", line 121, in open
    if self.selected_subprotocol != 'bokeh':
AttributeError: 'WSHandler' object has no attribute 'selected_subprotocol

Updating Tornado to 6.0.2 and restarting the kernel yields the following:

tornado.application - ERROR - Uncaught exception GET /status/ws (127.0.0.1)
HTTPServerRequest(protocol='http', host='10.0.111.225:8443', method='GET', uri='/status/ws', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/tornado/websocket.py", line 956, in _accept_connection
    open_result = handler.open(*handler.open_args, **handler.open_kwargs)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.6/site-packages/bokeh/server/views/ws.py", line 123, in open
    raise ProtocolError("Subprotocol header is not 'bokeh'")
bokeh.protocol.exceptions.ProtocolError: Subprotocol header is not 'bokeh'

Dask: 2.16.0 Bokeh: 2.0.2

This was with a Sagemaker Notebook instance in its own VPC with a public and private subnet. The notebook instance was launched in the private subnet with a security group that allows all traffic self-ingress.

Note, it does not work if you use the "Default" VPC for the Notebook instance's network configuration.

wleepang avatar May 27 '20 02:05 wleepang

Thanks for looking at this @wleepang!

Note, it does not work if you use the "Default" VPC for the Notebook instance's network configuration.

I suspect many of our users are going to be using the Default VPC. Do you know why this doesn't work?

jacobtomlinson avatar May 27 '20 12:05 jacobtomlinson

@jacobtomlinson -

That was from my initial testing. I've since got it to the state above with one of my Default VPCs by doing the following:

  • create a Private subnet in the default VPC
  • create a NAT Gateway in one of the public subnets
  • create a Route table with a route to 0.0.0.0/0 via the NAT Gateway
  • associate the Private subnet above to the Route table above

Again, launch the notebook instance into the Private subnet. One extra detail I didn't mention previously, I disabled internet access via SageMaker in the notebook networking config. This makes it so that access to the internet is provided via the VPC.

wleepang avatar May 27 '20 16:05 wleepang

Is there any update on this issue? I am having the same problems.

config dir: /home/ec2-user/.jupyter jupyterlab_git enabled - Validating... jupyterlab_git 0.10.1 OK jupyterlab_s3_browser enabled - Validating... jupyterlab_s3_browser OK config dir: /home/ec2-user/anaconda3/envs/JupyterSystemEnv/etc/jupyter dask_labextension enabled - Validating... dask_labextension 2.0.2 OK jupyter_server_proxy enabled - Validating... jupyter_server_proxy OK jupyterlab enabled - Validating... jupyterlab 1.2.16 OK jupyterlab_git enabled - Validating... jupyterlab_git 0.10.1 OK jupyterlab_s3_browser enabled - Validating... jupyterlab_s3_browser OK nbdime enabled - Validating... nbdime 1.1.0 OK nb_conda disabled - Validating... Error loading server extension nb_conda X is nb_conda importable? sparkmagic disabled - Validating... Error loading server extension sparkmagic X is sparkmagic importable? nbserverproxy enabled - Validating... nbserverproxy OK nbexamples.handlers enabled - Validating... nbexamples.handlers OK sagemaker_nbi_agent enabled - Validating... sagemaker_nbi_agent OK

bokeh==2.0.1 dask==2.19.0 dask-labextension==2.0.2 jupyter-server-proxy==1.5.0 jupyterlab==2.1.4 jupyterlab-server==1.1.0

image

ghost avatar Jun 24 '20 09:06 ghost

Is there any update on this issue?

As you can see above the last update was 28 days ago. There are no other side channels for conversation here other than github.

I'm not sure how much the Dask maintainers can do to help here. I think that this probably requires some engagement from AWS Sagemaker folks. Perhaps someone here has a support contract that they can use to engage AWS on this problem?

mrocklin avatar Jun 24 '20 13:06 mrocklin

@mrocklin I will communicate this to the AWS reps at our company and comment back if I hear anything. Thanks!

ghost avatar Jun 25 '20 01:06 ghost

I have contacted our AWS support team and they are investigating this issue with a fix to come, hopefully. The issue on AWS is 7137875451 for reference. Will update as I hear more.

ghost avatar Jun 29 '20 15:06 ghost

@blink1073 any chance you all can help here?

mrocklin avatar Jun 30 '20 15:06 mrocklin

Looking through the thread I think @jewelltp's ticket is the best bet. I'm not plugged in to the internal platform aspects, focusing on the JupyterLab 3.0 release.

blink1073 avatar Jun 30 '20 20:06 blink1073

Just an update - the AWS engineer has identified the issue and seems to agree that this needs to be fixed. Looks like we will have a resolution somewhat soon. @mrocklin @blink1073 any other AWS issues to report as I have their attention?

Response from AWS:

"Hi Tyler,

This is to update you that I am still waiting to hear from Service team and will get back to you as soon as I have further information.

As notified earlier I have replicated the issue in my account and now comparing the functionality of "TensorBoard" which works fine using the same proxy mechanism."

ghost avatar Jul 02 '20 16:07 ghost

Posting my solution (inspired by: https://modelpredict.com/sagemaker-ssh-setup/)

NOTES - tested on windows only - requires signup to 3rd-party https://ngrok.com/. (Also possible via a bastion host as explained in the link above)

Any problems/errors let me know and i'll update! Regards,

David

Steps

0. Register for a free account at https://ngrok.com/ and get your authentication token

1. (On Sagemaker) Setup Ngrok

# Download latest ngrok
curl https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip > ngrok.zip
# Unzip
unzip ngrok.zip
# Add your ngrok authentication key (found here -> https://dashboard.ngrok.com/get-started/setup)
./ngrok authtoken <ADD-YOUR-AUTH-KEY-HERE>

2. (On local machine) Find and copy your local machine's public ssh key.

In windows this is found in ~/.ssh/id_rsa.pub. Copy the entire string. It should look something like:

ssh-rsa AAAAB3NzaC1yc2....
.....
....= david@MY-LAPTOP-NAME

3. (On Sagemaker) Add this SSH key to ~/.ssh/authorized_keys. Use your favourite terminal editor for this (e.g. vim/nano)

4. (On Sagemaker) Run the ngrok TCP service

./ngrok tcp 22

If successful, you should see something like this image

Grab the host (red) and port (green).

5. (On Sagemaker) Create a notebook, start your dask client and grab the dashboard port (default=8787)

image

6. (On local machine) Connect via SSH Tunnel

Using the host and port obtained in step 4. and the dashboard port in step 5.

ssh -p <PORT> ec2-user@<HOST> -L <DASHBOARD-PORT>:localhost:8787
# e.g. ssh -p 13171 [email protected] -L 8787:localhost:8787

7. Access the dask dashboard now available in your localhost

image

davidtwomey avatar Jul 05 '20 19:07 davidtwomey

Thanks @davidtwomey for sharing this workaround.

Would be great to not have to do this though. Hopefully AWS will get back to us soon with a full solution.

jacobtomlinson avatar Jul 06 '20 10:07 jacobtomlinson

Here is the latest response from AWS. Looks like we will just have to be patient and wait for the permanent fix.

"Hi Tyler,

I have heard back from Service team and they have confirmed that Dask dashboard is not supported natively in SageMaker Notebook Instance at the moment. Service team took my findings into account while trying to make it work within existing SageMaker architecture but were not able to do so.

After my discussion with Service team, they have created and recorded it as a New Feature Request to support Dask dashboard natively. Please note that I am not able to provide implementation timeline for a new feature request. I would like to recommend keeping an eye on the AWS Blogs [1] and "What's New with AWS" page [2].

Service team has also noted down the GitHub issue and will provide an update when the native support for Dask dashboard is available.

I have been following the Github issue and the workaround provided by David (based on Ngrok and ssh tunneling) seems to be the only solution at the moment.

Thanks for your continued patience as we worked through this issue.

Please contact us if you need further help in this regard."

ghost avatar Aug 03 '20 13:08 ghost

Thank you for handling the cross-project communication Tyler

On Mon, Aug 3, 2020 at 6:43 AM Tyler Jewell [email protected] wrote:

Here is the latest response from AWS. Looks like we will just have to be patient and wait for the permanent fix.

"Hi Tyler,

I have heard back from Service team and they have confirmed that Dask dashboard is not supported natively in SageMaker Notebook Instance at the moment. Service team took my findings into account while trying to make it work within existing SageMaker architecture but were not able to do so.

After my discussion with Service team, they have created and recorded it as a New Feature Request to support Dask dashboard natively. Please note that I am not able to provide implementation timeline for a new feature request. I would like to recommend keeping an eye on the AWS Blogs [1] and "What's New with AWS" page [2].

Service team has also noted down the GitHub issue and will provide an update when the native support for Dask dashboard is available.

I have been following the Github issue and the workaround provided by David (based on Ngrok and ssh tunneling) seems to be the only solution at the moment.

Thanks for your continued patience as we worked through this issue.

Please contact us if you need further help in this regard."

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/5432#issuecomment-668030307, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTDVUD4Q42FZU2ZLEVLR625HXANCNFSM4I2L3FIQ .

mrocklin avatar Aug 03 '20 14:08 mrocklin

Hello!

Would like to get a status on this issue / if there's a fix for this! Thank you!

jennakwon06 avatar Jan 29 '21 20:01 jennakwon06

It'd also be great if I can find out who on AWS is in charge of this? I'm at Amazon and can look into the progress of the service team's fix.

jennakwon06 avatar Feb 23 '21 06:02 jennakwon06

Thanks for nudging this @jennakwon06. We haven't heard back from AWS for a while and the issue appears to still persist.

There is an outstanding ticket 7137875451 if that's something you're able to look at.

jacobtomlinson avatar Feb 23 '21 15:02 jacobtomlinson

I also work at Amazon. I found the internal ticket, which is not really moving forward from what I can tell unfortunately. I'll corral a bunch of +1s internally to get momentum and and prioritize this feature request.

gballardin avatar Feb 23 '21 16:02 gballardin

Just found this issue. It's quite disappointing that Sagemaker does not have full support for Dask (including dashboard) because they make a great combination.

rabernat avatar Dec 05 '22 17:12 rabernat

@gballardin, has there been any movement on this front? Agreeing with @rabernat that it would be incredible to leverage SageMaker resources with diagnostics from the Dask dashboard.

riley-brady avatar Feb 09 '23 20:02 riley-brady

Also interested in getting the dashboard working in Sagemaker.

aluhamaa avatar Dec 28 '23 10:12 aluhamaa