jupyter-server-proxy icon indicating copy to clipboard operation
jupyter-server-proxy copied to clipboard

Spark UI not accessible

Open h4gen opened this issue 6 years ago • 40 comments

Hi everybody,

as @ryanlovett asked me I opened this issue here, related to jupyterhub/zero-to-jupyterhub-k8s#1030. The Problem is as following:

After starting PySpark I am not able to access the Spark UI, resulting in a Jupyterhub 404 error. Here are (hopefully) all the relevant Information:

  1. I create a new user image from the from the jupyter/pyspark image
  2. The Dockerfile for this image contains:
FROM jupyter/pyspark-notebook:5b2160dfd919
RUN pip install nbserverproxy
RUN jupyter serverextension enable --py nbserverproxy
USER root
RUN echo “$NB_USER ALL=(ALL) NOPASSWD:ALL” > /etc/sudoers.d/notebook
USER $NB_USER
  1. I create the SparkContext() in the pod, created with respective image which gives me the link to the UI.
  2. The SparkContext() is created with the following config:
conf.setMaster('k8s://https://'+ os.environ['KUBERNETES_SERVICE_HOST'] +':443')
conf.set('spark.kubernetes.container.image', 'idalab/spark-py:spark')
conf.set('spark.submit.deployMode', 'client')
conf.set('spark.executor.instances', '2')
conf.setAppName('pyspark-shell')
conf.set('spark.driver.host', '10.16.205.42 ')
os.environ['PYSPARK_PYTHON'] = 'python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3'
  1. The link created by Spark is obviously not accessible on the hub as it points to <POD_IP>:4040
  2. I try to access the UI via .../username/proxy/4040 and .../username/proxy/4040/ both don't work and lead to a Jupyterhub 404.
  3. Other ports are accessible via this method so I assume nbserverextension is working correctly.
  4. This is the output of npnetstat -pl:
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:54695         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:33896         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:34577         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:52211         0.0.0.0:*               LISTEN      23/python
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      7/python
tcp        0      0 localhost:39388         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:39971         0.0.0.0:*               LISTEN      23/python
tcp        0      0 localhost:32867         0.0.0.0:*               LISTEN      23/python
tcp6       0      0 jupyter-hagen:43878     [::]:*                  LISTEN      45/java
tcp6       0      0 [::]:4040               [::]:*                  LISTEN      45/java
tcp6       0      0 localhost:32816         [::]:*                  LISTEN      45/java
tcp6       0      0 jupyter-hagen:41793     [::]:*                  LISTEN      45/java

One can see that the java processes have another format due to tcp6

  1. To check if this is the error I set the environment variable '_JAVA_OPTIONS' set to "-Djava.net.preferIPv4Stack=true" .

  2. This results in the following output but does not resolve the problem:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 localhost:54695         0.0.0.0:*               LISTEN      456/python
tcp        0      0 0.0.0.0:4040            0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:33896         0.0.0.0:*               LISTEN      456/python
tcp        0      0 localhost:34990         0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:36079         0.0.0.0:*               LISTEN      456/python
tcp        0      0 jupyter-hagen:35119     0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:34577         0.0.0.0:*               LISTEN      456/python
tcp        0      0 jupyter-hagen:42195     0.0.0.0:*               LISTEN      475/java
tcp        0      0 localhost:34836         0.0.0.0:*               LISTEN      456/python
tcp        0      0 0.0.0.0:8888            0.0.0.0:*               LISTEN      7/python
tcp        0      0 localhost:39971         0.0.0.0:*               LISTEN      456/python
tcp        0      0 localhost:32867         0.0.0.0:*               LISTEN      456/python
  1. I checked, whether the UI is generally accessible by running a local version of the user image on my PC and forwarding the port. That works fine!

My user image is available on docker hub at idalab/spark-user:1.0.2 so this should be easy to inject for debugging if neccessary.

Thank you for your help.

h4gen avatar Nov 30 '18 19:11 h4gen

I tried reproducing your example in a container with the following:

import os
from pyspark.conf import SparkConf
from pyspark.context import SparkContext

conf = SparkConf()
conf.setMaster('local')
conf.set('spark.kubernetes.container.image', 'idalab/spark-py:spark')
conf.set('spark.submit.deployMode', 'client')
conf.set('spark.executor.instances', '2')
conf.setAppName('pyspark-shell')
conf.set('spark.driver.host', '127.0.0.1')
os.environ['PYSPARK_PYTHON'] = 'python3' # Needs to be explicitly provided as env. Otherwise workers run Python 2.7
os.environ['PYSPARK_DRIVER_PYTHON'] = 'python3'  # Same

# Create context
sc = SparkContext(conf=conf, master=None) 

I confirmed that there is a service on :4040. Visiting localhost:8888/proxy/4040/ redirected me to localhost:8888/jobs/ which returned a 404. I then manually visited http://localhost:8888/proxy/4040/jobs/ and it displayed a small web app. All of the links within this app do not include the /proxy/4040/ path, e.g. http://localhost:8888/stages/, http://localhost:8888/storage/. This suggests to me that either this app as I've configured it does not inspect the URL it is being visited on or generates links that sit on top of the URL path (/stages/ rather than stages/).

Apparently sc.uiWebUrl is read-only.

ryanlovett avatar Nov 30 '18 22:11 ryanlovett

Interesting. I actually never tried to use the localhost:8888/proxy/4040/ variant in the container assuming it just makes sense on the hub. I just used port forwarding to check whether the UI is generally accessible. I can confirm the behaviour you're describing with /proxy/4040/jobs/, adding that all styles and graphics of the web-app are broken.

I just checked it on my jupyterhub deployment with jobs/ added so <JUPYTERHUB_URL>/hub/user/<username>/proxy/4040/jobs/. This replicates the behaviour from the container, leading to the web app with broken styles. It just works with a trailing / after jobs.

So what to do? Is this a Spark/PySpark issue, regarding the sc.uiWebUrl problem?

h4gen avatar Nov 30 '18 23:11 h4gen

So what to do? Is this a Spark/PySpark issue, regarding the sc.uiWebUrl problem?

I think so, without having dug into the source. I found some documentation on spark's webui properties which doesn't describe a way to alter the URL. I'll look into the source a bit more.

I suppose one could subclass nbserverproxy to alter the proxied content, but that could get messy.

ryanlovett avatar Dec 01 '18 02:12 ryanlovett

Looks like the Url can be changed with SPARK_PUBLIC_DNS. I tried it and changed it to <JUPYTERHUB_URL>/hub/user/<username>/proxy/4040/jobs/. This changes the sc.uiWebUrl to <JUPYTERHUB_URL>/hub/user/<username>/proxy/4040/jobs/:4040 resulting in a link that is actally redirecting to the web app but the app is still broken and links point to <JUPYTERHUB_URL>/<XYZ>

h4gen avatar Dec 01 '18 04:12 h4gen

Created https://issues.apache.org/jira/browse/SPARK-26242.

ryanlovett avatar Dec 01 '18 07:12 ryanlovett

@mgaido91 mentioned in the spark jira that setting spark.ui.proxyBase can address this. I've confirmed that if you add conf.set('spark.ui.proxyBase', '/proxy/4040') and then visit {your_server}/.../proxy/4040/jobs/, the webui renders correctly. Visiting /proxy/4040 still doesn't however.

I'll try his other suggestion of setting X-Forwarded-Context in the proxy.

ryanlovett avatar Dec 01 '18 22:12 ryanlovett

On my Jupyterhub deployment I have to configure it as conf.set('spark.ui.proxyBase', '/user/<username>/proxy/4040') but then it works as well!

h4gen avatar Dec 02 '18 04:12 h4gen

This is partially addressed by 50e0358. Visiting /hub/user/proxy/4040/ still takes you to /jobs/ but I think that is the webui. However visiting /hub/user/proxy/4040/{jobs,environment,...}/ does the right thing without requiring the proxyBase setting.

ryanlovett avatar Dec 03 '18 17:12 ryanlovett

Thanks for the documentation. I did as said above and it all works fine except the Executors tab in Spark UI. It seems that the proxy replaces the [app-id] with the port instead of the actual app-id.

From: https://spark.apache.org/docs/latest/monitoring.html /applications/[app-id]/allexecutors | A list of all(active and dead) executors for the given application.

Capture

ransoor2 avatar Mar 17 '19 13:03 ransoor2

Based mostly on work here by @h4gen and @ryanlovett, I've now built https://github.com/yuvipanda/jupyter-sparkui-proxy which does the right thing wrt redirects. Thank you! <3

needs docs and stuff.

yuvipanda avatar Apr 26 '19 02:04 yuvipanda

@yuvipanda Thanks for help! Still doesn't work.

  1. I think you misspelled in setup.py. Should be jupyter_sparkui_proxy/etc/jupyter-sparkui-proxy-serverextension.json instead of jupyter_server_proxy/etc/jupyter-server-proxy-serverextension.json.

  2. Im running in my dockerfile: ADD common/jupyter-sparkui-proxy /jupyter-sparkui-proxy RUN cd /jupyter-sparkui-proxy &&
    python setup.py install installation looks correct, but im still getting the same error as above.

ransoor2 avatar Apr 29 '19 11:04 ransoor2

I've solved setting conf.set("spark.ui.proxyBase", "") .
Then point to http://localhost:4040 to see the ui.

fbeneventi avatar Jun 11 '19 09:06 fbeneventi

@ransoor2 Were you able to find a workaround for the blank spark UI 'Executor' tab? I have the same issue.

dbricare avatar Aug 01 '19 14:08 dbricare

@dbricare No..

ransoor2 avatar Aug 14 '19 07:08 ransoor2

I'm having an issue I don't quite understand. I've followed the steps listed here by @ransoor2 to set up a spark on Kubernetes deployment. Things work (though there is actually a bug in all version of spark that was just recently introduced that stops workers from being started), but the spark UI only kind of works. Specifically, if I go to <JUPYTERHUB-IP>/user/<username>/proxy/4040/jobs/ things work as I expect them to. However, if I go to <JUPYTERHUB-IP>/user/<username>/proxy/4040/ which I expect based on @ryanlovett's comment https://github.com/jupyterhub/jupyter-server-proxy/issues/57#issuecomment-443793918 that I should be redirected to <JUPYTERHUB-IP>/user/<username>/proxy/4040/jobs/. Instead I am redirected to <JUPYTERHUB-IP>/hub/jobs/, which is a broken link.

Is this expected behavior? My spark config (the relevant bits) is basically identical to that of @ransoor2's writeup.

EDIT: I just looked at the jira issue that @ryanlovett opened, and he describes the same behavior. So, maybe this is expected?

albertmichaelj avatar Sep 09 '19 18:09 albertmichaelj

@ransoor2 Were you able to find a workaround for the blank spark UI 'Executor' tab? I have the same issue.

also looking for an update.

dshakey avatar Sep 26 '19 12:09 dshakey

So, the issue is in Spark Core.

See the utility: https://github.com/apache/spark/blob/c2d0d700b551e864bb7b2ae2a175ec8ade704488/core/src/main/resources/org/apache/spark/ui/static/utils.js#L88 .

function getStandAloneAppId(cb) {
  var words = document.baseURI.split('/');
  var ind = words.indexOf("proxy");
  if (ind > 0) {
    var appId = words[ind + 1];
    cb(appId);
    return;
  }
...

getStandAloneAppId will always return the value after "proxy", which in our case is the port, 4040

gveltri avatar Oct 25 '19 15:10 gveltri

I am trying to set up the dashboard for the user jobs with jupyter-server-proxy. I can get the dashboard with kubectl port-forward but that's not an option for all the users who won't have access to kubectl.

I tried setting the following:

conf = SparkConf().setAppName('Cluster1').setMaster(spark_master)
conf.set('spark.ui.proxyBase', f'/user/{user}/proxy/4040')
sc = SparkContext(conf=conf)
sc

but it still produces a url with http://jupyter-{user}:4040

When I do kubectl port-forward -n jupyterhub jupyter-{user} 4040:4040 it breaks the dashboard at localhost:4040, with links pointing to localhost:4040/user/{user}/proxy/4040/jobs etc. So it seems some config has been propagated but sc.uiWebUrl is still not the right one, does anyone have an idea what's wrong?

belfhi avatar Jan 24 '20 12:01 belfhi

Any update about this issue? I'm running spark locally (master=local[*]) using zero-to-jupyterhub installation, but, I'm getting a 404 error every time I try to open the spark UI. I tried with https://github.com/yuvipanda/jupyter-sparkui-proxy proposed by @yuvipanda but i'm still getting page not exists.

mbalduini avatar Mar 10 '20 09:03 mbalduini

I've been playing with this today and it seems to work as long as you start off by going to one of the Spark UI pages. This way you avoid that initial absolute path redirect that breaks it. So going to /proxy/4040/jobs/ seems to work with the latest versions of everything.

I wound up monkeypatching the function that renders the Spark cluster info where the "Spark UI" link shows up. This way it'll point it to the jupyter-server-proxy link instead. The goal is to make things more seamless feeling for users who won't be expected to know this stuff.

from pyspark.context import SparkContext


def uiWebUrl(self):
    from urllib.parse import urlparse
    web_url = self._jsc.sc().uiWebUrl().get()
    port = urlparse(web_url).port
    return "/proxy/{}/jobs/".format(port)

SparkContext.uiWebUrl = property(uiWebUrl)

Edit: The executors tab of the Spark UI doesn't seem to be working.

zac-hopkinson avatar Mar 12 '20 19:03 zac-hopkinson

Thank you for your comment. I tried again, I installed the proxy, set the configuration property, but the result is always the same (404 on the UI page). I used the latest stable release of the helm chart to install jupyterhub on my k8s cluster (the 0.8.2 with the 0.9.6 version of jupyterhub). May the problem can related to the jupyterhub version?

mbalduini avatar Mar 13 '20 09:03 mbalduini

I've been playing with this today and it seems to work as long as you start off by going to one of the Spark UI pages. This way you avoid that initial absolute path redirect that breaks it. So going to /proxy/4040/jobs/ seems to work with the latest versions of everything.

I wound up monkeypatching the function that renders the Spark cluster info where the "Spark UI" link shows up. This way it'll point it to the jupyter-server-proxy link instead. The goal is to make things more seamless feeling for users who won't be expected to know this stuff.

from pyspark.context import SparkContext


def uiWebUrl(self):
    from urllib.parse import urlparse
    web_url = self._jsc.sc().uiWebUrl().get()
    port = urlparse(web_url).port
    return "/proxy/{}/jobs/".format(port)

SparkContext.uiWebUrl = property(uiWebUrl)

Edit: The executors tab of the Spark UI doesn't seem to be working.

I am interested in how you updated the spark ui link to point to the proxy/4040/ does it work for any port number?

Also the issue with the executor is a bug in spark I think. Look at the URL it drops one of the paths. If I remember correctly. Seems fixed in spark 2.4.4 I actually forgot about it till ya said it. So will check

dshakey avatar Mar 13 '20 09:03 dshakey

Thank you for your comment. I tried again, I installed the proxy, set the configuration property, but the result is always the same (404 on the UI page). I used the latest stable release of the helm chart to install jupyterhub on my k8s cluster (the 0.8.2 with the 0.9.6 version of jupyterhub). May the problem can related to the jupyterhub version?

Correct if I'm wrong, but does widget only work with jupyterlab. It working fine for me on 1.2.6

dshakey avatar Mar 13 '20 09:03 dshakey

On my Jupyterhub deployment I have to configure it as conf.set('spark.ui.proxyBase', '/user/<username>/proxy/4040') but then it works as well!

Hi I'm facing the same issue with spark history server, will it applies to history server too?

guoqiao1992 avatar Jun 11 '20 23:06 guoqiao1992

i am using Anaconda and have a jupyter notebook running on port 8890.

http://127.0.0.1:8890/notebooks/Spark_DataFrame_Basics.ipynb

Once i invoke spark session, i can see port 4040 LISTENING On the server.

Code: from pyspark.sql import SparkSession from pyspark.sql.types import StructField,StringType,IntegerType,StructType spark = SparkSession.builder.appName('MYfirstAPP').getOrCreate()

netstat -anp |grep 4040 tcp6 0 0 :::4040 :::* LISTEN 11553/java

i can access the spark jobs ,using :

http://localhost:4040

image

My environment variables:

Name ▴ Value
Java Home /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre
Java Version 1.8.0_252 (Oracle Corporation)
Scala Version version 2.12.10

spark.app.id | local-1595389946259 spark.app.name | MYfirstAPP spark.driver.host | localhost.local spark.driver.port | 42234 spark.executor.id | driver spark.master | local[*] spark.rdd.compress | True spark.scheduler.mode | FIFO spark.serializer.objectStreamReset | 100 spark.submit.deployMode | client spark.submit.pyFiles |   spark.ui.showConsoleProgress | true

sajinvk avatar Jul 22 '20 04:07 sajinvk

I am working with Jupyterhub on K8s which I deployed using a helm chart.

So I have been unable to figure out a solution to properly display the details on the executors or even details of stages inside a job. I always get a blank page.

For example when I visit the executors page, I can see in the browser console the following message:

Failed to load resource: the server responded with a status of 404 ()
`https://{JHUB_URL}/jupyterhub/user/jupyterhub-admin/proxy/4040/api/v1/applications/4040/stages/0/0`

As noted by a user above

So, the issue is in Spark Core.

See the utility: https://github.com/apache/spark/blob/c2d0d700b551e864bb7b2ae2a175ec8ade704488/core/src/main/resources/org/apache/spark/ui/static/utils.js#L88 .

function getStandAloneAppId(cb) {
  var words = document.baseURI.split('/');
  var ind = words.indexOf("proxy");
  if (ind > 0) {
    var appId = words[ind + 1];
    cb(appId);
    return;
  }
...

getStandAloneAppId will always return the value after "proxy", which in our case is the port, 4040

It seems like because the url we have has proxy in it the function getStandAloneAppId(cb) will use the port as the application ID which fails and we get a blank page.

Was anybody able to get around this issue. I have tried using jupyter-sparkui-proxy but have the same issue. I will appreciate any help. Thank you.

hbuttguavus avatar Sep 22 '20 15:09 hbuttguavus

I was able to resolve the issue with the blank executors tab by custom modifying the spark javascript and the function mentioned by @hbuttguavus getStandAloneAppId. I did similar updates to functions createTemplateURI and createRESTEndPoint.

I didn't do anything fancy, just hardcoded the port to search for in the URI (e.g., "4040") and if its found use the REST API to retrieve the app ID.

The changes are needed in the spark-core JAR which can be unzipped, modified, and re-zipped.

In spark 3.0 the function can be found in spark-core_2.12-3.0.1/org/apache/spark/ui/static/utils.js

In spark 2.4 its spark-core_2.11-2.4.4/org/apache/spark/ui/static/executorspage.js

Here's an example for spark 2.4 (it may need to be modified based on the format of the jupyter URL).

function getStandAloneppId(cb) {
    var words = document.baseURI.split('/');
    //Custom jupyterhub workaround that parses port number in URI
    var ind = words.indexOf("4040");
    if (ind > 0) {
        $.getJSON(location.origin + "/" + words[3] + "/user-redirect/proxy/4040/api/v1/applications", function(response, status, jqXHR) {
            if (response && response.length > 0) {
                var appId = response[0].id
                cb(appId);
                return;
            }
        });
    }
    var ind = words.indexOf("proxy");
    var indp = words.indexOf("4040");
    if ((ind > 0) && (indp < 1)) {
        var appId = words[ind + 1];
        cb(appId);
        return;
    }
    var ind = words.indexOf("history");
    if (ind > 0) {
        var appId = words[ind + 1];
        cb(appId);
        return;
    }
    //Looks like Web UI is running in standalone mode
    //Let's get application-id using REST End Point
    $.getJSON(location.origin + "/api/v1/applications", function(response, status, jqXHR) {
        if (response && response.length > 0) {
            var appId = response[0].id
            cb(appId);
            return;
        }
    });
}

dbricare avatar Sep 25 '20 21:09 dbricare

@dbricare Do you want to submit the fix to Apache Spark? I think many of us will benefit from it. It might need some refinement though.

hanyucui avatar Nov 09 '20 01:11 hanyucui

@h4gen : How did you use SPARK_PUBLIC_DNS to change sc.uiWebUrl? I am attempting to do this in pyspark but setting it as an environmental variable via os.environ does not seem to work.

Looks like the Url can be changed with SPARK_PUBLIC_DNS. I tried it and changed it to <JUPYTERHUB_URL>/hub/user/<username>/proxy/4040/jobs/. This changes the sc.uiWebUrl to <JUPYTERHUB_URL>/hub/user/<username>/proxy/4040/jobs/:4040 resulting in a link that is actally redirecting to the web app but the app is still broken and links point to <JUPYTERHUB_URL>/<XYZ>

LucaMingarelli avatar Dec 26 '20 03:12 LucaMingarelli

I fixed executors & stages pages for Pyspark 3.0.x notebooks inside the Kubeflow environment.

Put this inside your notebook Dockerfile:

# Fixing SparkUI + proxy
RUN cd /tmp && mkdir -p org/apache/spark/ui/static/ && \
    curl -s https://gist.githubusercontent.com/slenky/f89ee5de18a2f075a481e3d4452a427c/raw/470c7526cdfd1022c14a0857156d26a606508c30/stagepage.js > org/apache/spark/ui/static/stagepage.js && \
    curl -s https://gist.githubusercontent.com/slenky/f89ee5de18a2f075a481e3d4452a427c/raw/470c7526cdfd1022c14a0857156d26a606508c30/utils.js > org/apache/spark/ui/static/utils.js && \
    zip -u $SPARK_HOME/jars/spark-core_2.12-3.0.1.jar org/apache/spark/ui/static/* && \
    rm -rf /tmp/org

https://gist.github.com/slenky/f89ee5de18a2f075a481e3d4452a427c

However, now I am getting an issues with loading a DataTables on both stages & executors:

DataTables warning: table id=accumulator-table - Cannot reinitialise DataTable. For more information about this error, please see http://datatables.net/tn/3

Any help is appreciated

slenky avatar Jan 13 '21 11:01 slenky