shinyproxy
shinyproxy copied to clipboard
Bug: bind address/port already in use
Running 3.0.2 and we have tried several approaches to fixing this issue (including restarting both docker and shiny before a large group signs on).
Long story short we keep getting errors saying the port is already in use. It is hit an miss where sometimes we have 60+ users connected and sometimes we get to 10 and it starts throwing this error.
{"message":"driver failed programming external connectivity on endpoint gracious_albattani (e3e2e1d06079f881534875f477dd98b25fb7c016f685d56ff777fc397e819cd3): Error starting userland proxy: listen tcp4 0.0.0.0:40018: bind: address already in use"}
EDIT: we are seeing two different errors: Bind for 0.0.0.0:[0-9]+ failed: port is already allocated listen tcp4 0.0.0.0:[0-9]+: bind: address already in use
There are no containers listed in shinyproxy or docker as running on that port.
server@prod-ae-spvm:~/shinyproxy-etc$ sudo lsof -n -i :40018
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
docker-pr 138817 root 44u IPv4 639804 0t0 TCP 172.17.0.1:40018->172.17.0.51:3838 (ESTABLISHED)
server@prod-ae-spvm:~/shinyproxy-etc$ kill -9 138817
Yet even after killing the process the error still persists with the same port "being occupied". We are seeing this occur over and over again and I believe it is tied to what others have seen here as well:
https://support.openanalytics.eu/t/error-500-port-is-already-allocated/604
2023-09-21 12:40:03.916 INFO 149714 --- [pool-1-thread-4] e.o.containerproxy.service.ProxyService : [[email protected] proxyId=8c3743f3-ad4e-4e5d-b2a7-968af1bb194f specId=grouping_summaries] Starting proxy
2023-09-21 12:40:03.973 INFO 149714 --- [pool-1-thread-4] c.s.docker.client.DefaultDockerClient : Starting container with Id: a42a78e6fe2084a11ad90b657343f000987560564103edbcde1d29b49f624d86
2023-09-21 12:40:04.273 WARN 149714 --- [pool-1-thread-4] e.o.containerproxy.service.ProxyService : [[email protected] proxyId=8c3743f3-ad4e-4e5d-b2a7-968af1bb194f specId=grouping_summaries] Proxy failed to start
eu.openanalytics.containerproxy.ProxyFailedToStartException: Container with index 0 failed to start
at eu.openanalytics.containerproxy.backend.AbstractContainerBackend.startProxy(AbstractContainerBackend.java:116) ~[containerproxy-1.0.2.jar!/:1.0.2]
at eu.openanalytics.containerproxy.service.ProxyService.startOrResumeProxy(ProxyService.java:455) [containerproxy-1.0.2.jar!/:1.0.2]
at eu.openanalytics.containerproxy.service.ProxyService.lambda$startProxy$4(ProxyService.java:279) [containerproxy-1.0.2.jar!/:1.0.2]
at eu.openanalytics.containerproxy.service.ProxyService.lambda$action$11(ProxyService.java:599) [containerproxy-1.0.2.jar!/:1.0.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_382]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_382]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_382]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_382]
at java.lang.Thread.run(Thread.java:750) ~[na:1.8.0_382]
Caused by: eu.openanalytics.containerproxy.ContainerFailedToStartException: Docker container failed to start
at eu.openanalytics.containerproxy.backend.docker.DockerEngineBackend.startContainer(DockerEngineBackend.java:153) ~[containerproxy-1.0.2.jar!/:1.0.2]
at eu.openanalytics.containerproxy.backend.AbstractContainerBackend.startProxy(AbstractContainerBackend.java:108) ~[containerproxy-1.0.2.jar!/:1.0.2]
... 8 common frames omitted
Caused by: com.spotify.docker.client.exceptions.DockerRequestException: Request error: POST http://localhost:2375/containers/a42a78e6fe2084a11ad90b657343f000987560564103edbcde1d29b49f624d86/start: 500, body: {"message":"driver failed programming external connectivity on endpoint gracious_albattani (e3e2e1d06079f881534875f477dd98b25fb7c016f685d56ff777fc397e819cd3): Error starting userland proxy: listen tcp4 0.0.0.0:40018: bind: address already in use"}
at com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:2840) ~[docker-client-8.16.0.jar!/:8.16.0]
at com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:2734) ~[docker-client-8.16.0.jar!/:8.16.0]
at com.spotify.docker.client.DefaultDockerClient.containerAction(DefaultDockerClient.java:783) ~[docker-client-8.16.0.jar!/:8.16.0]
at com.spotify.docker.client.DefaultDockerClient.containerAction(DefaultDockerClient.java:768) ~[docker-client-8.16.0.jar!/:8.16.0]
at com.spotify.docker.client.DefaultDockerClient.startContainer(DefaultDockerClient.java:763) ~[docker-client-8.16.0.jar!/:8.16.0]
at eu.openanalytics.containerproxy.backend.docker.DockerEngineBackend.startContainer(DockerEngineBackend.java:147) ~[containerproxy-1.0.2.jar!/:1.0.2]
... 9 common frames omitted
Caused by: javax.ws.rs.InternalServerErrorException: HTTP 500 Internal Server Error
at org.glassfish.jersey.client.JerseyInvocation.convertToException(JerseyInvocation.java:1107) ~[jersey-client-2.26-b03.jar!/:na]
at org.glassfish.jersey.client.JerseyInvocation.translate(JerseyInvocation.java:894) ~[jersey-client-2.26-b03.jar!/:na]
at org.glassfish.jersey.client.JerseyInvocation.access$700(JerseyInvocation.java:98) ~[jersey-client-2.26-b03.jar!/:na]
at org.glassfish.jersey.client.JerseyInvocation$5.completed(JerseyInvocation.java:851) ~[jersey-client-2.26-b03.jar!/:na]
at org.glassfish.jersey.client.ClientRuntime.processResponse(ClientRuntime.java:197) ~[jersey-client-2.26-b03.jar!/:na]
at org.glassfish.jersey.client.ClientRuntime.access$300(ClientRuntime.java:74) ~[jersey-client-2.26-b03.jar!/:na]
at org.glassfish.jersey.client.ClientRuntime$2$1$1.run(ClientRuntime.java:167) ~[jersey-client-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:297) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:303) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.client.ClientRuntime$2$1.response(ClientRuntime.java:165) ~[jersey-client-2.26-b03.jar!/:na]
at org.glassfish.jersey.apache.connector.ApacheConnector$1.run(ApacheConnector.java:498) ~[jersey-apache-connector-2.26-b03.jar!/:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_382]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_382]
at jersey.repackaged.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299) ~[jersey-guava-2.26-b03.jar!/:na]
at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) ~[na:1.8.0_382]
at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:50) ~[jersey-guava-2.26-b03.jar!/:na]
at jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37) ~[jersey-guava-2.26-b03.jar!/:na]
at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:494) ~[jersey-apache-connector-2.26-b03.jar!/:na]
at org.glassfish.jersey.client.ClientRuntime$2.run(ClientRuntime.java:181) ~[jersey-client-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:315) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:297) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.internal.Errors.process(Errors.java:267) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:303) ~[jersey-common-2.26-b03.jar!/:na]
at org.glassfish.jersey.client.ClientRuntime$3.run(ClientRuntime.java:209) ~[jersey-client-2.26-b03.jar!/:na]
... 5 common frames omitted
I created a script as an interim solution to try and rapidly kill hung containers (note: you have to update the path to your shinyproxy log on line 10): https://gist.github.com/spinanicky/d82763292cacf357b26fd2dfcf11836c
Hey there,
I am experiencing the exact same behaviour running shinyproxy 2.6.1. It just seems to randomly occur and currently we just have to restart docker to get this working again..
Thanks!
Interesting case! What Docker version is installed on server?
Interesting case! What Docker version is installed on server?
24.0.7
Hi We'll investigate this issue, so that we can hopefully reduce the problem and fix it. I'll provide an update once we have more information.
I'm unable to reproduce this issue. It seems like the docker-proxy command keeps running, even when the container was already stopped. Could you share the following information?
- operating system (including version)
- are you using a local VM, or some cloud provider?
- resources of the ShinyProxy server (memory/cpu)
- your
application.yml - can you check the logs of the Docker daemon?
Hopefully with this information I'm able to set up a similar environment and reproduce the problem.
- Proxmox 8.0.2 -> Ubuntu 23.10 VM (note the host (proxmox) uses zfs). I have been meaning to test whether this might be the cause as it also prevents us from running containers rather than VMs.
- Sits on two dedicated hosts (we give users the option to connect to either url given the failures described herein)
- A) 8 x Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz + 64 GB ram
- B) 12 x Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz + 128 GB ram
- VM for shinyproxy has 30gb + 2vcpu on server A and 60gb + 8 vcpu on B
- will share privately
- is there a specific command you want me to run for the logs because its a 800mb file :)
Thank you for the details, the issue with ZFS sounds interesting, I could try to use a ZFS host. At least it seems your server has enough resources, so I don't think we need to look into that direction.
W.r.t to the application.yml, you can strip any secrets and urls, I just want to make sure there are no unexpected things in it (e.g. low-level spring settings).
For the docker logs it would be useful if you can trigger the problem and then run sudo journalctl -u docker -n300 to get the last 300 log lines.
Yes, resource testing was the first thing I jumped to when I saw CPU logs on netdata - which is why Server B has 8 VCPU and it still wouldn't alleviate the issue. Port range start was changed to 40k as I originally thought that got rid of the issue when we got the same error on 20k ports.
server: #used to rewrite the keycloak redirect url to include https
forward-headers-strategy: native
logging:
file:
name: /etc/shinyproxy/logs/shinyproxy.log
# level:
# eu:
# openanalytics:
# containerproxy:
# auth: DEBUG
proxy:
title: ABCD
hide-navbar: true
landing-page: /
favicon-path: favicon.ico
heartbeat-rate: 10000
heartbeat-timeout: 60000
container-wait-time: 60000
default-proxy-max-lifetime: 240
port: 8080
admin-groups: [ABCD]
usage-stats-url: jdbc:mysql://ABCD
# usage-stats-username:
# usage-stats-password:
authentication: openid
openid:
auth-url: ABCD
token-url: ABCD
jwks-url: ABCD
userinfo-url: ABCD
client-id: ABCD #client_id: client -> settings -> client_id
client-secret: ABCD
logout-url: ABCD
username-attribute: email
roles-claim: groups
docker:
cert-path: /home/none
url: http://localhost:2375
port-range-start: 40000
port-range-max: 40100
# Applications english
specs:
- id: basics
display-name: Basics
description: Learn how to get started with coding in R using public health examples
container-cmd: ["R", "-e", "learnr::run_tutorial('basics', package = 'introexercises', shiny_args = list(launch.browser = 0, port = 3838, host = '0.0.0.0'))"]
container-image: ABCD/intro_course:latest
access-groups: [ABCD, ABCD, ABCD]
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.000895718Z" level=debug msg="Calling GET /v1.43/info"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.013002186Z" level=debug msg="Calling GET /v1.43/images/json"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.017248361Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22healthy%22%3Atrue%7D%7D"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.017641243Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22unhealthy%22%3Atrue%7D%7D"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.017975165Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22starting%22%3Atrue%7D%7D"
Jan 25 12:46:27 prod-ae-spvm dockerd[1962]: time="2024-01-25T12:46:27.018386194Z" level=debug msg="Calling GET /v1.43/containers/json?all=1&filters=%7B%22health%22%3A%7B%22none%22%3Atrue%7D%7D"
Hi @spinanicky
I was not able to reproduce this consistently, but I have seen it happening a few times randomly as well. As mentioned above it seems the docker-proxy process keeps running just a bit longer than usual, but eventually stops as well. Therefore, ShinyProxy 3.1.0 now waits (at least) 10 seconds before re-using a port that was previously used by a container. It would be nice if you could test the 3.1.0 release and let us know whether this fixes the issue for you.
@LEDfan you have been very active - my email is flooded with your name :-)
I will have a look over the weekend, but don't let this hold you up with a release.
Thank you for testing.
Thank you for pushing the update. We've got the same issue with our installation, so +1 to this. I'll upgrade now, and see if it fixes it.
On the side note, you could request port 0 from the OS - Linux and FreeBSD will give you back a random, unused network port, but I never tried this on Windows. You could then capture that number programatically and use it, instead of working with the baseline/hard-coded port 2000 + 1 per service, and playing a guessing game.
@LEDfan is this normal:
I am also seeing in the lsof established table that Shinyproxy seems to be taking both IPv4 and IPv6 routes and forwarding them to ports. I am going to try and disable IPv6 on the docker network and see if that does anything.
@LEDfan - the bug persists with the new release.
<driver failed programming external connectivity on endpoint sp-container-d968fae8-6410-4d94-b924-5a96fd0248cc-0 (24782c3abc75b009fd8d49afe09cb6201da9901bc1cb50cf85d15be3bb8e4493): Error starting userland proxy: listen tcp4 0.0.0.0:40002: bind: address already in use"}
Hi @spinanicky that is unfortunate, I'll continue having a look. When you notice the issue, do you perhaps have (some) containers that failed to start? It seems I recently observed the issue when I started an app just after another app/instance failed to start.
Hi I found a potential issue and implemented a fix.
Can you try with the docker image openanalytics/shinyproxy-snapshot:3.1.1-SNAPSHOT-20240606.143229 or this jar: https://nexus.openanalytics.eu/repository/snapshots/eu/openanalytics/shinyproxy/3.1.1-SNAPSHOT/shinyproxy-3.1.1-20240606.143229-2-exec.jar
@LEDfan so after the last fix didn't resolve the issue we transitioned to using the docker image version that uses the custom bridge network rather than ports for each new container, and since then we have had no issues.
Okay, thanks for letting us know. We just released ShinyProxy 3.1.1, which includes the fix, so I'm going to close this.