configurable-http-proxy icon indicating copy to clipboard operation
configurable-http-proxy copied to clipboard

Hundred of users leads to running out of tens of thousands of ephemeral ports

Open consideRatio opened this issue 1 year ago • 47 comments

From https://github.com/jupyterhub/configurable-http-proxy/issues/388#issuecomment-2359217928 and onwards is context on how a CHP pod can end up running out of ephemeral ports, with a mitigation strategy in https://github.com/jupyterhub/configurable-http-proxy/issues/388#issuecomment-2362097477.

consideRatio avatar Sep 19 '24 20:09 consideRatio

Based on https://github.com/jupyterhub/configurable-http-proxy/issues/388#issuecomment-2359227825, I think this may not be an issue with CHP as much as the software running in the user servers leading to a flood of connections be initiated via the UI.

  • Based on the linked comment, it seems that the UI choice matters
  • Any connection from the UI to the hub pod would be required to access the hub pod via the CHP proxy
  • If the jupyter server or a kernel within it connected to the hub pod, it could do it by a "hairpin" connection going out to the internet and straight back to the CHP pod through an ingress controller etc, or, it could go straight to the hub pod via k8s local networking. I figure this is a difference of accessing https://my-domain.org/hub/ vs http://hub:8081.

consideRatio avatar Sep 19 '24 20:09 consideRatio

@felder this is a followup to https://github.com/jupyterhub/configurable-http-proxy/issues/388#issuecomment-2359416947. I inspected two active deployments with 222 and 146 currently active users respectively.

A hub where users access either /tree or /lab

From inspection, it seems this makes use of jupyter_server 2.12.1 and jupyterlab 4.0.9.

This is from a CHP pod with a hub currently having 222 current user pods running the image quay.io/2i2c/utoronto-image:2525722ac1d5, where users may be accessing /tree or /lab and its not clear what distribution of UI usage among those.

/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | grep 8081 | wc -l
80
/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | grep 8888 | wc -l
1416
/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | wc -l
1609
pip list
Package                           Version
--------------------------------- ------------
absl-py                           2.1.0
affine                            2.4.0
aiohttp                           3.9.5
aiosignal                         1.3.1
alabaster                         0.7.16
alembic                           1.13.0
altair                            5.2.0
annotated-types                   0.7.0
anyio                             4.1.0
archspec                          0.2.2
argon2-cffi                       23.1.0
argon2-cffi-bindings              21.2.0
arrow                             1.3.0
arviz                             0.18.0
astropy                           5.3.4
astroquery                        0.4.7
asttokens                         2.4.1
astunparse                        1.6.3
async-generator                   1.10
async-lru                         2.0.4
async-timeout                     4.0.3
attrs                             23.1.0
Babel                             2.13.1
backports.tarfile                 1.0.0
beautifulsoup4                    4.12.2
bleach                            6.1.0
blinker                           1.7.0
blis                              0.7.10
bokeh                             3.3.2
boltons                           23.0.0
Bottleneck                        1.3.7
branca                            0.7.2
Brotli                            1.1.0
cached-property                   1.5.2
cachetools                        5.4.0
catalogue                         2.0.10
certifi                           2023.11.17
certipy                           0.1.3
cffi                              1.16.0
charset-normalizer                3.3.2
click                             8.1.7
click-plugins                     1.1.1
cligj                             0.7.2
cloudpathlib                      0.16.0
cloudpickle                       3.0.0
colorama                          0.4.6
comm                              0.1.4
conda                             23.11.0
conda-libmamba-solver             23.11.1
conda-package-handling            2.2.0
conda_package_streaming           0.9.0
confection                        0.1.4
cons                              0.4.6
contextily                        1.4.0
contourpy                         1.2.0
cryptography                      41.0.7
cycler                            0.12.1
cymem                             2.0.8
Cython                            3.0.6
cytoolz                           0.12.2
dask                              2023.12.0
datascience                       0.17.6
debugpy                           1.8.0
decorator                         5.1.1
defusedxml                        0.7.1
descartes                         1.1.0
dill                              0.3.7
distributed                       2023.12.0
distro                            1.8.0
dm-tree                           0.1.8
docutils                          0.21.2
entrypoints                       0.4
esda                              2.5.1
et-xmlfile                        1.1.0
etuples                           0.3.9
exceptiongroup                    1.2.0
executing                         2.0.1
fastjsonschema                    2.19.0
fastprogress                      1.0.3
fica                              0.3.1
filelock                          3.15.4
fiona                             1.9.5
flatbuffers                       24.3.25
folium                            0.17.0
fonttools                         4.46.0
fqdn                              1.5.1
frozenlist                        1.4.1
fsspec                            2023.12.1
galpy                             1.9.2
gast                              0.6.0
GDAL                              3.8.1
geographiclib                     2.0
geopandas                         0.14.4
geopy                             2.4.1
giddy                             2.3.5
git-credential-helpers            0.2
gitdb                             4.0.11
github3.py                        4.0.1
GitPython                         3.1.40
gmpy2                             2.1.2
google-auth                       2.32.0
google-auth-oauthlib              1.2.1
google-pasta                      0.2.0
graphviz                          0.20.3
greenlet                          3.0.1
grpcio                            1.64.1
h5netcdf                          1.3.0
h5py                              3.10.0
html5lib                          1.1
idna                              3.6
imagecodecs                       2023.9.18
imageio                           2.31.5
imagesize                         1.4.1
importlib-metadata                7.0.0
importlib-resources               6.1.1
ipykernel                         6.26.0
ipylab                            1.0.0
ipympl                            0.9.3
ipython                           8.18.1
ipython-genutils                  0.2.0
ipywidgets                        8.1.1
isoduration                       20.11.0
jaraco.classes                    3.4.0
jaraco.context                    5.3.0
jaraco.functools                  4.0.0
jax                               0.4.30
jaxlib                            0.4.30
jedi                              0.19.1
jeepney                           0.8.0
Jinja2                            3.1.2
joblib                            1.3.2
json5                             0.9.14
jsonpatch                         1.33
jsonpointer                       2.4
jsonschema                        4.20.0
jsonschema-specifications         2023.11.2
jupyter_client                    7.4.9
jupyter-contrib-core              0.4.2
jupyter-contrib-nbextensions      0.7.0
jupyter_core                      5.5.0
jupyter-events                    0.9.0
jupyter-highlight-selected-word   0.2.0
jupyter-lsp                       2.2.1
jupyter_nbextensions_configurator 0.6.4
jupyter-remote-desktop-proxy      1.2.1
jupyter-resource-usage            1.0.2
jupyter_server                    2.12.1
jupyter-server-mathjax            0.2.6
jupyter_server_proxy              4.3.0
jupyter_server_terminals          0.4.4
jupyter-telemetry                 0.1.0
jupyter-tree-download             1.0.1
jupyterhub                        4.0.2
jupyterlab                        4.0.9
jupyterlab_git                    0.50.0
jupyterlab_pygments               0.3.0
jupyterlab_server                 2.25.2
jupyterlab-widgets                3.0.9
jupyterthemes                     0.20.0
jupytext                          1.15.2
jwcrypto                          1.5.6
kaleido                           0.2.1
keras                             2.15.0
keyring                           25.2.1
kiwisolver                        1.4.5
langcodes                         3.4.0
language_data                     1.2.0
lazy_loader                       0.3
lesscpy                           0.15.1
libclang                          18.1.1
libmambapy                        1.5.4
libpysal                          4.9.2
llvmlite                          0.40.1
locket                            1.0.0
logical-unification               0.4.6
lxml                              5.2.2
lz4                               4.3.2
Mako                              1.3.0
mamba                             1.5.4
mapclassify                       2.6.1
marisa-trie                       1.1.0
Markdown                          3.6
markdown-it-py                    3.0.0
MarkupSafe                        2.1.3
markus-jupyter-extension          0.1.4
matplotlib                        3.8.2
matplotlib-inline                 0.1.6
mdit-py-plugins                   0.4.1
mdurl                             0.1.2
menuinst                          2.0.0
mercantile                        1.2.1
miniKanren                        1.0.3
mistune                           3.0.2
ml-dtypes                         0.3.2
more-itertools                    10.3.0
mpmath                            1.3.0
msgpack                           1.0.7
multidict                         6.0.5
multipledispatch                  1.0.0
munkres                           1.1.4
murmurhash                        1.0.10
nbclassic                         1.0.0
nbclient                          0.8.0
nbconvert                         7.12.0
nbdime                            4.0.1
nbformat                          5.9.2
nbgitpuller                       1.2.1
nest-asyncio                      1.5.8
networkx                          3.2.1
nltk                              3.8.1
notebook                          6.5.7
notebook_shim                     0.2.3
numba                             0.57.1
numexpr                           2.8.7
numpy                             1.24.4
oauthlib                          3.2.2
openpyxl                          3.1.2
opt-einsum                        3.3.0
otter-grader                      5.5.0
overrides                         7.4.0
packaging                         23.2
pamela                            1.1.0
pandas                            2.1.3
pandocfilters                     1.5.0
parso                             0.8.3
partd                             1.4.1
patsy                             0.5.4
pexpect                           4.8.0
pickleshare                       0.7.5
Pillow                            10.1.0
pip                               24.0
pkgutil_resolve_name              1.3.10
platformdirs                      4.1.0
plotly                            5.22.0
pluggy                            1.3.0
ply                               3.11
preshed                           3.0.9
prometheus-client                 0.19.0
prompt-toolkit                    3.0.41
protobuf                          4.24.4
psutil                            5.9.5
ptyprocess                        0.7.0
pure-eval                         0.2.2
py-cpuinfo                        9.0.0
pyarrow                           14.0.1
pyarrow-hotfix                    0.6
pyasn1                            0.6.0
pyasn1_modules                    0.4.0
pycosat                           0.6.6
pycparser                         2.21
pycurl                            7.45.1
pydantic                          2.8.2
pydantic_core                     2.20.1
pyerfa                            2.0.1.4
Pygments                          2.17.2
PyJWT                             2.8.0
pymc                              5.10.4
pyOpenSSL                         23.3.0
pyparsing                         3.1.1
pyproj                            3.6.1
PySocks                           1.7.1
pytensor                          2.18.6
python-dateutil                   2.8.2
python-json-logger                2.0.7
python-on-whales                  0.71.0
pytz                              2023.3.post1
pyvo                              1.5.2
PyWavelets                        1.4.1
PyYAML                            6.0.1
pyzmq                             25.1.2
quantecon                         0.7.2
rasterio                          1.3.9
redis                             5.0.7
referencing                       0.32.0
regex                             2024.5.15
requests                          2.31.0
requests-oauthlib                 2.0.0
rfc3339-validator                 0.1.4
rfc3986-validator                 0.1.1
rich                              13.7.1
rise                              5.7.1
rpds-py                           0.13.2
rsa                               4.9
Rtree                             1.3.0
ruamel.yaml                       0.18.5
ruamel.yaml.clib                  0.2.7
scikit-image                      0.22.0
scikit-learn                      1.3.2
SciPy                             1.11.4
seaborn                           0.13.0
SecretStorage                     3.3.3
Send2Trash                        1.8.2
setuptools                        68.2.2
shapely                           2.0.4
shellingham                       1.5.4
simpervisor                       1.0.0
six                               1.16.0
smart-open                        6.4.0
smmap                             5.0.0
sniffio                           1.3.0
snowballstemmer                   2.2.0
snuggs                            1.4.7
sortedcontainers                  2.4.0
soupsieve                         2.5
spacy                             3.7.4
spacy-legacy                      3.0.12
spacy-loggers                     1.0.5
Sphinx                            7.4.4
sphinxcontrib-applehelp           1.0.8
sphinxcontrib-devhelp             1.0.6
sphinxcontrib-htmlhelp            2.0.5
sphinxcontrib-jsmath              1.0.1
sphinxcontrib-qthelp              1.0.7
sphinxcontrib-serializinghtml     1.1.10
splot                             1.1.5.post1
spreg                             1.5.0
SQLAlchemy                        2.0.23
srsly                             2.4.8
stack-data                        0.6.2
statsmodels                       0.14.0
sympy                             1.12
tables                            3.9.2
tblib                             2.0.0
tenacity                          8.5.0
tensorboard                       2.15.2
tensorboard-data-server           0.7.2
tensorflow                        2.15.1
tensorflow-estimator              2.15.0
tensorflow-io-gcs-filesystem      0.37.1
tensorflow-probability            0.23.0
termcolor                         2.4.0
terminado                         0.18.0
textblob                          0.17.1
thinc                             8.2.5
threadpoolctl                     3.2.0
tifffile                          2023.9.26
tinycss2                          1.2.1
toml                              0.10.2
tomli                             2.0.1
toolz                             0.12.0
tornado                           6.3.3
tqdm                              4.66.1
traitlets                         5.14.0
truststore                        0.8.0
typer                             0.9.4
types-python-dateutil             2.8.19.14
typing_extensions                 4.8.0
typing-utils                      0.1.0
tzdata                            2023.3
uri-template                      1.3.0
uritemplate                       4.1.1
urllib3                           2.1.0
wasabi                            1.1.2
wcwidth                           0.2.12
weasel                            0.3.4
webcolors                         1.13
webencodings                      0.5.1
websocket-client                  1.7.0
websockify                        0.12.0
Werkzeug                          3.0.3
wheel                             0.42.0
widgetsnbextension                4.0.9
wrapt                             1.14.1
xarray                            2024.6.0
xarray-einstats                   0.7.0
xlrd                              2.0.1
xyzservices                       2023.10.1
yarl                              1.9.4
zict                              3.0.0
zipp                              3.17.0
zstandard                         0.22.0

A hub where users access /rstudio

This is from a CHP pod with a hub currently having 146 current user pods running the image quay.io/2i2c/utoronto-r-image:5e7aea3c30ff, where users are accessing /rstudio.

/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | grep 8081 | wc -l
5
/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED | grep 8888 | wc -l
1164
/srv/configurable-http-proxy $ netstat -natp | grep ESTABLISHED| wc -l
1250

From inspection, it seems this makes use of jupyter-server 1.24.0 together with rstudio stuff in the frontend.

pip list
Package                       Version
----------------------------- ---------------
aiohttp                       3.9.3
aiosignal                     1.3.1
alabaster                     0.7.16
alembic                       1.13.1
annotated-types               0.6.0
anyio                         3.7.1
argon2-cffi                   23.1.0
argon2-cffi-bindings          21.2.0
arrow                         1.3.0
asttokens                     2.4.1
astunparse                    1.6.3
async-generator               1.10
async-lru                     2.0.4
async-timeout                 4.0.3
attrs                         23.2.0
Babel                         2.14.0
beautifulsoup4                4.12.3
bleach                        6.1.0
certifi                       2024.2.2
certipy                       0.1.3
cffi                          1.16.0
charset-normalizer            3.3.2
click                         8.1.7
comm                          0.2.2
cryptography                  42.0.5
debugpy                       1.8.1
decorator                     5.1.1
defusedxml                    0.7.1
dill                          0.3.8
docutils                      0.20.1
entrypoints                   0.4
exceptiongroup                1.2.0
executing                     2.0.1
fastjsonschema                2.19.1
fica                          0.3.1
fqdn                          1.5.1
frozenlist                    1.4.1
git-credential-helpers        0.2
github3.py                    4.0.1
greenlet                      3.0.3
h11                           0.14.0
httpcore                      1.0.4
httpx                         0.27.0
idna                          3.6
imagesize                     1.4.1
ipykernel                     6.29.3
ipylab                        1.0.0
ipython                       8.22.2
ipython-genutils              0.2.0
ipywidgets                    8.1.2
isoduration                   20.11.0
jedi                          0.19.1
Jinja2                        3.1.3
json5                         0.9.22
jsonpointer                   2.4
jsonschema                    4.21.1
jsonschema-specifications     2023.12.1
jupyter_client                7.4.9
jupyter_core                  5.7.2
jupyter-events                0.9.1
jupyter-lsp                   2.2.4
jupyter-resource-usage        0.7.2
jupyter-rsession-proxy        2.2.0
jupyter-server                1.24.0
jupyter_server_proxy          4.1.1
jupyter_server_terminals      0.5.3
jupyter-shiny-proxy           1.1
jupyter-telemetry             0.1.0
jupyterhub                    4.0.2
jupyterlab                    3.4.8
jupyterlab_pygments           0.3.0
jupyterlab_server             2.25.4
jupyterlab_widgets            3.0.10
jupytext                      1.16.1
Mako                          1.3.2
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
matplotlib-inline             0.1.6
mdit-py-plugins               0.4.0
mdurl                         0.1.2
mistune                       3.0.2
multidict                     6.0.5
nbclassic                     0.5.6
nbclient                      0.10.0
nbconvert                     7.16.2
nbformat                      5.10.2
nbgitpuller                   1.2.0
nest-asyncio                  1.6.0
notebook                      6.5.6
notebook_shim                 0.2.4
numpy                         1.26.4
oauthlib                      3.2.2
otter-grader                  5.2.2
overrides                     7.7.0
packaging                     24.0
pamela                        1.1.0
pandas                        2.2.1
pandocfilters                 1.5.1
parso                         0.8.3
pexpect                       4.9.0
pip                           24.0
platformdirs                  4.2.0
prometheus_client             0.20.0
prompt-toolkit                3.0.43
psutil                        5.9.8
ptyprocess                    0.7.0
pure-eval                     0.2.2
pycparser                     2.21
pydantic                      2.6.4
pydantic_core                 2.16.3
Pygments                      2.17.2
PyJWT                         2.8.0
pyOpenSSL                     24.1.0
python-dateutil               2.9.0.post0
python-json-logger            2.0.7
python-on-whales              0.70.0
pytz                          2024.1
PyYAML                        6.0.1
pyzmq                         24.0.1
referencing                   0.33.0
requests                      2.31.0
retrolab                      0.3.21
rfc3339-validator             0.1.4
rfc3986-validator             0.1.1
rpds-py                       0.18.0
ruamel.yaml                   0.18.6
ruamel.yaml.clib              0.2.8
Send2Trash                    1.8.2
setuptools                    59.6.0
simpervisor                   1.0.0
six                           1.16.0
sniffio                       1.3.1
snowballstemmer               2.2.0
soupsieve                     2.5
Sphinx                        7.2.6
sphinxcontrib-applehelp       1.0.8
sphinxcontrib-devhelp         1.0.6
sphinxcontrib-htmlhelp        2.0.5
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.7
sphinxcontrib-serializinghtml 1.1.10
SQLAlchemy                    2.0.28
stack-data                    0.6.3
terminado                     0.18.1
tinycss2                      1.2.1
toml                          0.10.2
tomli                         2.0.1
tornado                       6.4
tqdm                          4.66.2
traitlets                     5.14.2
typer                         0.9.0
types-python-dateutil         2.8.19.20240311
typing_extensions             4.10.0
tzdata                        2024.1
uri-template                  1.3.0
uritemplate                   4.1.1
urllib3                       2.2.1
wcwidth                       0.2.13
webcolors                     1.13
webencodings                  0.5.1
websocket-client              1.7.0
wheel                         0.43.0
widgetsnbextension            4.0.10
wrapt                         1.16.0
yarl                          1.9.4

consideRatio avatar Sep 19 '24 21:09 consideRatio

This doesn't rule out CHP- to do that you'd need to compare this with another proxy like Traefik. For example, if CHP isn't closing connections as fast as the browser this could lead to too many ports in use.

Do the existing CHP tests cover HTTP persistent connections? https://en.m.wikipedia.org/wiki/HTTP_persistent_connection

manics avatar Sep 19 '24 21:09 manics

One thing I'm noticing as I investigate is that user servers that use lab (as opposed to rsession-proxy or the like) interact with the hub pod a lot more often. Anytime I interact with the file browser, launcher, etc last_activity for the hub pod route in chp updates. This is not the case if /rstudio is designated as the default URL.

Additionally the ESTABLISHED connection count to hubip:8081 with a single user pod running lab (as opposed to rstudio) increments pretty steadily as I do things like kill the pod, kill the kernel, refresh the browser, etc.

felder avatar Sep 19 '24 21:09 felder

This doesn't rule out CHP- to do that you'd need to compare this with another proxy like Traefik. For example, if CHP isn't closing connections as fast as the browser this could lead to too many ports in use.

i believe this might be happening... if a user closes their laptop, or opens their notebook in a new browser (which happens more often than you'd imagine) we see a lot of spam (hundreds of 503s being reported) in the proxy logs:

21:08:06.483 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.491 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.514 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.533 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.536 [ConfigProxy] error: 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888
21:08:06.561 [ConfigProxy] info: Removing route /user/<hub user>
21:08:06.561 [ConfigProxy] info: 204 DELETE /api/routes/user/<hub user>
21:08:15.521 [ConfigProxy] info: Adding route /user/<hub user> -> http://10.28.26.176:8888
21:08:15.521 [ConfigProxy] info: Route added /user/<hub user> -> http://10.28.26.176:8888
21:08:15.521 [ConfigProxy] info: 201 POST /api/routes/user/<hub user>
21:08:18.845 [ConfigProxy] info: 200 GET /api/routes

shaneknapp avatar Sep 19 '24 21:09 shaneknapp

Hmmm, so we have a spam of 503 GET /user/<hub user>/api/events/subscribe connect ECONNREFUSED 10.28.21.53:8888, where something (jupyterlab in browser?) tries to access a user server, but the proxying fails with connection refused - perhaps because the server is shutting down or similar.

After that, jupyterhub asks CHP to delete the route.

After that, I expect the thing that got 503 now won't get 503 responses because the proxy pod won't try to proxy to the route any more, instead it will do something else --- maybe redirect to the hub pod as a default route - which then gets spammed.

@shaneknapp I guess that we can see some redirects with debug logging or similarly - or can we see redirect responses from CHP already and we aren't seeing them?

consideRatio avatar Sep 19 '24 23:09 consideRatio

I think /api/events/subscribe are associated with websockets, an endpoint added in jupyter_server 2.0.0a2. Is something related to jupyterlab's browser side code re-trying excessively against that when failing?

From the logs i see one failed request every ~10ms five times in a row, which I guess means no delay between re-attempts etc.

21:08:06.483
21:08:06.491
21:08:06.514
21:08:06.533
21:08:06.536

@minrk I recall that you submitted a PR somewhere, sometime a while back, about excessive connections or retries. Was this to this endpoint?

consideRatio avatar Sep 19 '24 23:09 consideRatio

So when running lab, when I do things like kill my pod or start up another connection from another tab or browser I tend to be able to get chp to emit 503 messages similar to:

23:48:41.600 [ConfigProxy] error: 503 GET /user/felder/terminals/websocket/1 connect ETIMEDOUT 10.28.35.109:8888
23:48:49.793 [ConfigProxy] error: 503 GET /user/felder/api/events/subscribe connect ETIMEDOUT 10.28.35.109:8888
...
00:01:16.903 [ConfigProxy] error: 503 GET /user/felder/api/kernels/d9472c13-5a55-47cf-a569-ed981f709bbf/channels connect ECONNREFUSED 10.28.8.3:8888
00:01:16.905 [ConfigProxy] error: 503 GET /user/felder/api/kernels/d9472c13-5a55-47cf-a569-ed981f709bbf/channels connect ECONNREFUSED 10.28.8.3:8888
00:01:16.907 [ConfigProxy] error: 503 GET /user/felder/api/kernels/d9472c13-5a55-47cf-a569-ed981f709bbf/channels connect ECONNREFUSED 10.28.8.3:8888
00:01:16.974 [ConfigProxy] error: 503 GET /user/felder/api/kernels/d9472c13-5a55-47cf-a569-ed981f709bbf connect ECONNREFUSED 10.28.8.3:8888

This does make sense when I'm killing my user pod since the server is no longer there at that ip.

However, when this happens I see a correlated increase in the number of established connections from chp->hub:8081. Those connections seem to persist.

felder avatar Sep 20 '24 00:09 felder

Noting that if I delete the route to the hub pod in chp, the connections still persist.

felder avatar Sep 20 '24 00:09 felder

What version of jupyterlab and jupyter server / notebook is used?

consideRatio avatar Sep 20 '24 05:09 consideRatio

The original issue was https://github.com/jupyterlab/jupyterlab/issues/3929

I think JupyterLab is supposed to stop checking the API when it realizes the server is gone. But 503 means JupyterHub thinks the server is there when it's not, and 503s should be retried after a delay (I wouldn't be surprised if jupyterlab probably still retries a bit too fast). When the server is actually stopped (i.e. the Hub notices via poll and updates the proxy to remove the route), these 503s should become 424s, at which point perhaps JupyterLab may slow down/pause requests as it should.

I don't expect the singleuser server is making "hairpin" connections to the Hub via CHP, unless things like hub_connect_ip are customized. The default behavior is for all Jupyter Server -> Hub communication to be via internal network.

I have a strong suspicion that if this tends to happen in JupyterLab and not other UIs, it is related to JupyterLab's reconnect behavior, the use of websockets (less common in other UIs), or both. My hunch is something like this:

  • when a server stops, JupyterLab tries reconnecting too aggressively, causing a lot of requests that are bound to fail
  • When an endpoint is not there (503 error), CHP doesn't cleanup sockets for the failed requests quickly enough (possibly addressable via one or more timeout options) and/or at all (bug in node-http-proxy or possibly CHP itself)

@shaneknapp @felder do you have any indication that CHP starts seeing 503 errors before problems start, i.e. that it might be a cause and not merely a symptom? Short-term 503 errors are 'normal' behavior when a server shuts down prematurely, e.g. due to internal culler behavior, so if that triggers a cascade of too many bound-to-fail requests that don't get cleaned up fast enough, that seems a plausible scenario, at least.

minrk avatar Sep 20 '24 13:09 minrk

@consideRatio jupyterlab 4.0.11 and 4.2.5, notebook 7.0.7 and 7.2.2

@minrk It's possible, honestly we're just trying to wrap our heads around the ephemeral port issue and chp so all possibilities are on the table.

felder avatar Sep 20 '24 17:09 felder

What version of jupyterlab and jupyter server / notebook is used?

data8

  • jupyterhub==4.1.6
  • jupyterlab==4.0.11
  • jupyter_server==2.7.0
  • notebook==7.0.7

data100

  • jupyterhub==4.1.6
  • jupyterlab==4.2.5
  • jupyterlab_server==2.27.3
  • jupyter_server==2.14.2
  • notebook==7.2.2

datahub

  • notebook==7.0.7
  • jupyterlab==4.0.11
  • jupyterhub==4.1.6

we actually bumped data100 to the most recent versions of these packages yesterday "just to see" if it helped. however, we allocated 16G of ram per user for an assignment as of this morning, and expect fewer kernel crashes and therefore less orphaned (or excess?) ephemeral ports.

shaneknapp avatar Sep 20 '24 18:09 shaneknapp

Can we test this without JupyterHub? Run JupyterLab, manually start CHP, create a CHP route to JupyterLab, and access JupyterLab via CHP. Based on the above if you open another tab, or Ctrl-C and restart JupyterLab, the number of ports in use should significantly increase.

manics avatar Sep 20 '24 18:09 manics

@shaneknapp @felder do you have any indication that CHP starts seeing 503 errors before problems start, i.e. that it might be a cause and not merely a symptom? Short-term 503 errors are 'normal' behavior when a server shuts down prematurely, e.g. due to internal culler behavior, so if that triggers a cascade of too many bound-to-fail requests that don't get cleaned up fast enough, that seems a plausible scenario, at least.

well, it usually takes a few hours for the "problems" to start as the ports begin to accumulate. during this time (aka all day) we are seeing plenty of 503s. the rate at which they are happening isn't really something that i'd say is trackable as they occur based on how users are interacting with the system (closing laptop, reopening later, new browser tabs etc). and, of course, the more users, the more 503s. and as @felder said, we're still trying to unpack what's really happening here and wrapping our heads around the 503s and the causation/correlation relationship to them and the chp running out of ephemeral ports.

shaneknapp avatar Sep 20 '24 18:09 shaneknapp

welp, this just happened on a smaller hub w/about 75 users logged in.

image

the outage started at 330pm, and lasted 15m. users got a blank page and 'service inaccessible'.

you can see the cpu peg at 100% during this time, and eventually it recovered. the proxy ram usage doesn't hit anywhere near what we've allocated as max (3G), but there's some interesting ups and downs in that graph.

this is a more complex hub deployment, w/two sidecar containers along each user container -- one w/mongodb, and the other w/postgres installed.

shaneknapp avatar Sep 23 '24 23:09 shaneknapp

This is the simplest CHP/JupyterLab setup I can come up with:

Run CHP on default ports (8000 and 8001), no auth, log all requests:

configurable-http-proxy --log-level debug

Create a route /test pointing to http://localhost:8888

curl -XPOST http://localhost:8001/api/routes/test -d '{"target":"http://127.0.0.1:8888"}'

Start JupyterLab under /test

jupyter-lab --no-browser --ServerApp.base_url=/test

Open http://localhost:8000/test/ in a browser

Show IPv4 connections involving ports 8000, 8001, 8888:

ss -4n |grep -E '127.0.0.1:(8888|8000|8001)'

I've tried reloading JupyterLab in my browser, and killing/restarting JupyterLab. When JupyterLab is killed (Ctrl-C) with the browser still open CHP doesn't keep any sockets open.

There's a burst of connections whilst it loads: ss -4n |grep -E '127.0.0.1:(8888|8000|8001)'

tcp   ESTAB 0      0                127.0.0.1:34776      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41964
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41978
tcp   ESTAB 0      0                127.0.0.1:34704      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41976
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41994
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41988
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:55962
tcp   ESTAB 0      0                127.0.0.1:34720      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:41978      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:41988      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41984
tcp   ESTAB 0      0                127.0.0.1:41994      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:34724      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:34756      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:55962      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:41984      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:34762      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:34800      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:34740      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:41964      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:41976      127.0.0.1:8888 

but it stabilises again:

tcp   ESTAB 0      0                127.0.0.1:34776      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41964
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41978
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41976
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41994
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41988
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:55962
tcp   ESTAB 0      0                127.0.0.1:41978      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:41988      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:8888       127.0.0.1:41984
tcp   ESTAB 0      0                127.0.0.1:41994      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:55962      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:41984      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:34800      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:41964      127.0.0.1:8888 
tcp   ESTAB 0      0                127.0.0.1:45264      127.0.0.1:8000 
tcp   ESTAB 0      0                127.0.0.1:41976      127.0.0.1:8888 

I haven't seen any continual rise in the number of sockets if I repeat the reload/kill/restart cycle.

manics avatar Sep 24 '24 15:09 manics

@manics Just wanted to note that the connections we're concerned about are going to hubip:8081, not sure if that's represented in your test. The issue is not connections from chp to/from user pods but instead from chp to/from the hub pod. The userpod destinations vary enough where there is no concern about ephemeral ports there (as of yet). It's the hub pod which is a single destination/port pair where we see the issue.

felder avatar Oct 28 '24 19:10 felder

@felder I haven't tested that since CHP shouldn't be proxying pods to hub:8081, the pods should connect directly to the hub:8081.

External Hub API requests will go through CHP though, this includes requests made through the Hub admin UI.

Can you turn on debug logging in CHP, filter the logs for connections to :8081, and show us the URLs that are being requested? This should narrow down the component that's responsible.

manics avatar Nov 02 '24 17:11 manics

@manics perhaps this comment describes what we are seeing better?

https://github.com/jupyterhub/configurable-http-proxy/issues/388#issuecomment-2359416947

I’ll see what debug logging reveals.

felder avatar Nov 02 '24 19:11 felder

If anyone here still seeing these issues would like to test with configurable-http-proxy 5.0.0-beta.1, there is a new underlying proxy based on http-proxy-3 and can report back, that would be hugely helpful.

minrk avatar May 16 '25 09:05 minrk

yeah, i'm happy to give it a whirl... i'm deploying some hubs for community colleges over the summer that won't have a lot of load but at least we'll be able to see if it's stable.

On Fri, May 16, 2025 at 2:48 AM Min RK @.***> wrote:

minrk left a comment (jupyterhub/configurable-http-proxy#557) https://github.com/jupyterhub/configurable-http-proxy/issues/557#issuecomment-2886225581

If anyone here still seeing these issues would like to test with configurable-http-proxy 5.0.0-beta.1, there is a new underlying proxy based on http-proxy-3 and can report back, that would be hugely helpful.

— Reply to this email directly, view it on GitHub https://github.com/jupyterhub/configurable-http-proxy/issues/557#issuecomment-2886225581, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMIHLBMJSAUKO6LSS2USP326WX5NAVCNFSM6AAAAAB5IJ6GV6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQOBWGIZDKNJYGE . You are receiving this because you were mentioned.Message ID: @.***>

shaneknapp avatar May 16 '25 16:05 shaneknapp

Thanks! I'm happy to let shane go first, but yeah we at UCB still have this issue but we did manage to mitigate it significantly by increasing the ephemeral port space. If this fixes it for good, that'd be awesome.

felder avatar May 16 '25 18:05 felder

Thanks! I'm happy to let shane go first, but yeah we at UCB still have this issue but we did manage to mitigate it significantly by increasing the ephemeral port space. If this fixes it for good, that'd be awesome.

fwiw i doubt that i'll have significant load (more than 10s of users) until the fall semester starts tho... and even then probably only one hub.

@felder i'd suggest testing this out on a couple of hubs over the summer session...

shaneknapp avatar May 16 '25 18:05 shaneknapp

Just curious how this has been working for folks? I know load isn't particularly high during the summer. Anyone running it this fall?

jlongland avatar Sep 03 '25 16:09 jlongland

so far so good... we're up to ~100 concurrent users each day (across 3-4 hubs), and everything is holding together fine. this is still nowhere hear the volume of users that the uc berkeley hubs get, but we do expect to see the total user count double by the end of september.

once i feel that this version of chp (5.0.1) is stable, i'll recommend it be deployed at uc berkeley and we can go from there.

(also adding @yijungi-ucb who runs the berkeley hubs now for visibility)

shaneknapp avatar Sep 03 '25 17:09 shaneknapp

Berkeley DataHub upgraded chp to 5.0.1 3 days ago, today we are seeing the same ephemeral ports running out issue in one of our hubs, which has 120 concurrent users at maximum.

yijunge-ucb avatar Sep 08 '25 22:09 yijunge-ucb

here's a related issue... different cause, but same effect -- proxy pod runs out of ephemeral ports and eventually dies: https://github.com/jupyterlab/jupyter-ai/issues/1482

shaneknapp avatar Sep 18 '25 18:09 shaneknapp

That issue seems to point to increasing the number of websocket connection attempts to a server that's not running correlating to the problem.

So maybe attempting to open lots of websockets to /user/notruning/whatever will help us reproduce and diagnose. Maybe there's a limit for concurrent outbound connections we can add somewhere in CHP.

minrk avatar Sep 18 '25 18:09 minrk

websocket connections to stopped servers are indeed the key, and I've located the bug, but not yet the fix: https://github.com/sagemathinc/http-proxy-3/issues/26

minrk avatar Sep 19 '25 19:09 minrk