portainer
portainer copied to clipboard
Portainer agent down after 2.20.1 upgrade
Before you start please confirm the following.
- [X] Yes, I've searched similar issues on GitHub.
- [X] Yes, I've checked whether this issue is covered in the Portainer documentation or knowledge base.
Problem Description
2 Devices
- Openmediavault with the main Portainer instance (Debian 11; 192.168.2.54)
- Armbian device with Portainer Agent (Armbian 24.2.1 (Debian 12); 192.168.2.52)
After I upgraded both instances to 2.20.1, the portainer-agent environment is shown "down" in the WebUI. When I click on the environment, portainer loads way longer than usual, but finally opens the agent environment. The environment overview shows it as "up". After leaving the page for several minutes and going back to it, the agent is "down" again.
No settings changed on the systems, just an update of Portainer and the agent to sts (2.20.1).
Expected Behavior
Open the Portainer WebUI and having agent environment status "up".
Actual Behavior
Open the Portainer WebUI and having agent environment status "down".
Steps to Reproduce
- Open Portainer WebUI -> Agent environment down
- Click on Agent environment -> load time is way longer than usual -> agent environment appears as up
- Leave WebUI for let's sys half an hour -> agent environment down
Portainer logs or screenshots
Portainer Main:
2024/04/06 10:22PM INF portainer/main.go:823 > starting Portainer | build_number=61 go_version=go1.21.6 image_tag=linux-amd64-2.20.1 nodejs_version=18.20.0 version=2.20.1 webpack_version=5.88.2 yarn_version=1.22.22 2024/04/06 10:22PM INF http/server.go:463 > starting HTTPS server | bind_address=:9443 2024/04/06 10:22PM INF http/server.go:447 > starting HTTP server | bind_address=:9000 ... 2024/04/06 10:47PM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/06 10:57PM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/06 11:07PM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/06 11:17PM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/06 11:37PM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/06 11:47PM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/06 11:57PM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 12:07AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 12:27AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 12:42AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 01:02AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 01:12AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 01:27AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 06:32AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 06:42AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 07:02AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 07:12AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 07:22AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 07:32AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 07:42AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 07:52AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 08:07AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 08:27AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 08:37AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 08:47AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 08:57AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 09:07AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 09:17AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 09:32AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 09:47AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 10:02AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 10:12AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 10:22AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 10:37AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 10:52AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 11:02AM ERR snapshot/snapshot.go:313 > background schedule error (environment snapshot), unable to execute pending actions | error="environment \"labor\" (id: 3) is not up" 2024/04/07 11:13AM WRN docker/snapshot.go:93 > unable to snapshot engine version | error="Cannot connect to the Docker daemon at tcp://192.168.2.52:9001. Is the docker daemon running?" environment=labor
Agent environment:
2024/04/07 09:28AM INF ./main.go:86 > agent running on Docker platform 2024/04/07 09:28AM FTL ./main.go:92 > unable to retrieve information from Docker | error="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" 2024/04/07 09:28AM INF ./main.go:86 > agent running on Docker platform 2024/04/07 09:28AM FTL ./main.go:92 > unable to retrieve information from Docker | error="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" 2024/04/07 09:29AM INF ./main.go:86 > agent running on Docker platform 2024/04/07 09:29AM FTL ./main.go:92 > unable to retrieve information from Docker | error="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" 2024/04/07 09:29AM INF ./main.go:86 > agent running on Docker platform 2024/04/07 09:29AM FTL ./main.go:92 > unable to retrieve information from Docker | error="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" 2024/04/07 09:29AM INF ./main.go:86 > agent running on Docker platform 2024/04/07 09:29AM FTL ./main.go:92 > unable to retrieve information from Docker | error="Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?" 2024/04/07 09:33AM INF ./main.go:86 > agent running on Docker platform 2024/04/07 09:33AM INF registry/server.go:101 > starting registry credential server 2024/04/07 09:33AM INF http/server.go:99 > starting Agent API server | api_version=2.20.1 server_addr=0.0.0.0 server_port=9001 use_tls=true 2024/04/07 09:37:11 http: TLS handshake error from 192.168.2.54:56224: EOF 2024/04/07 09:37:11 http: TLS handshake error from 192.168.2.54:56190: EOF 2024/04/07 09:37:11 http: TLS handshake error from 192.168.2.54:56218: EOF 2024/04/07 09:38:33 http: TLS handshake error from 192.168.2.54:58672: EOF 2024/04/07 09:39:59 http: TLS handshake error from 192.168.2.54:58282: EOF 2024/04/07 09:39:59 http: TLS handshake error from 192.168.2.54:58264: EOF 2024/04/07 09:40:27 http: TLS handshake error from 192.168.2.54:33830: EOF 2024/04/07 09:40:27 http: TLS handshake error from 192.168.2.54:33840: EOF 2024/04/07 09:42:14 http: TLS handshake error from 192.168.2.54:45652: EOF 2024/04/07 10:02:20 http: TLS handshake error from 192.168.2.54:49548: EOF 2024/04/07 10:02:20 http: TLS handshake error from 192.168.2.54:49554: EOF
Docker version on both devices: `Client: Docker Engine - Community Version: 26.0.0 API version: 1.45 Go version: go1.21.8 Git commit: 2ae903e Built: Wed Mar 20 15:18:25 2024 OS/Arch: linux/arm Context: default
Server: Docker Engine - Community Engine: Version: 26.0.0 API version: 1.45 (minimum version 1.24) Go version: go1.21.8 Git commit: 8b79278 Built: Wed Mar 20 15:18:25 2024 OS/Arch: linux/arm Experimental: false containerd: Version: 1.6.28 GitCommit: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc: Version: 1.1.12 GitCommit: v1.1.12-0-g51d5e94 docker-init: Version: 0.19.0 GitCommit: de40ad0`
Portainer version
2.20.1
Portainer Edition
Business Edition (BE/EE) with 5NF / 3NF license
Platform and Version
Docker 26.0.0 | see in log section
OS and Architecture
Main Portainer instance (Debian 11; AMD64) - Portainer Agent (Armbian 24.2.1 (Debian 12); ARM/v7)
Browser
Vivaldi 6.6.3271.57
What command did you use to deploy Portainer?
Portainer Main:
services:
portainer:
image: portainer/portainer-ee:sts
container_name: portainer
restart: unless-stopped
security_opt:
- no-new-privileges:true
volumes:
- /etc/localtime:/etc/localtime:ro
- /var/run/docker.sock:/var/run/docker.sock
- /opt/tools/portainer/data:/data
ports:
- 9000:9000
- 9443:9443
Portainer Agent:
docker run -d \
-p 9001:9001 \
--name portainer_agent \
--restart=always \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /var/lib/docker/volumes:/var/lib/docker/volumes \
portainer/agent:sts
Additional Information
No response
Having the same issue and moved back to the previous version. :(
Having the same issue and moved back to the previous version. :(
What previous version did you move back to?
Rolling back Agent did not solve it for me unfortunately.
I rollbacked the portainer instance from 2.20.1 to 2.19.4. I don't use remote agents but use them via sockets and had the same issues.
I rollbacked the portainer instance from 2.20.1 to 2.19.4. I don't use remote agents but use them via sockets and had the same issues.
Dont't know if there were changes under the hood, but a simple rollback to 2.19.4 on the main instance did also not fix the issue for me. Posting here was my last shot.
We're not seeing this with most users, so it's likely there's something unique about the setups of those of you that are experiencing this. Are you perhaps able to give us a bit more detail around your environments so we can see if there's some common factors at play? Things like operating systems and versions, storage configurations (local storage vs NFS or similar), OS security measures (for example AppArmor / SElinux), etc would be helpful. The more info we can get the more likely we'll be able to figure out a cause.
Hello,
I'm also having this issue. OS Ubuntu 22.04.4 LTS (arm64) Running agent v 2.19.4 Docker version 25.0.5 Portainer versions BE 2.19.4
My stderr logs are as follows
stderrٖѓ2024/04/11 02:55PM INF github.com/portainer/agent/http/server.go:99 > starting Agent API server | api_version=2.19.4 server_addr=0.0.0.0 server_port=9001 use_tls=truez
stderrf2024/04/11 15:06:08 http: TLS handshake error from 172.16.16.6:51012: local error: tls: bad record MACz]
stderr턭I2024/04/11 15:06:24 http: TLS handshake error from 172.16.16.6:58774: EOF]]
stderr뎞I2024/04/11 15:06:24 http: TLS handshake error from 172.16.16.6:58786: EOF]]
stderr˖I2024/04/11 15:06:36 http: TLS handshake error from 172.16.16.6:35838: EOF]]
stderrקI2024/04/11 15:06:42 http: TLS handshake error from 172.16.16.6:40822: EOF]]
stderrI2024/04/11 15:06:46 http: TLS handshake error from 172.16.16.6:40868: EOF]]
stderrI2024/04/11 15:07:21 http: TLS handshake error from 172.16.16.6:58656: EOF]]
stderrI2024/04/11 15:07:21 http: TLS handshake error from 172.16.16.6:58652: EOF]]
stderrI2024/04/11 15:07:21 http: TLS handshake error from 172.16.16.6:58676: EOF]]
stderrώI2024/04/11 15:07:24 http: TLS handshake error from 172.16.16.6:58698: EOF]]
stderrI2024/04/11 15:09:37 http: TLS handshake error from 172.16.16.6:38264: EOF]]
stderrI2024/04/11 15:09:52 http: TLS handshake error from 172.16.16.6:50628: EOF]]
stderr蓖I2024/04/11 15:09:59 http: TLS handshake error from 172.16.16.6:34136: EOF]]
stderrߖI2024/04/11 15:09:59 http: TLS handshake error from 172.16.16.6:34134: EOF]]
stderrI2024/04/11 15:10:11 http: TLS handshake error from 172.16.16.6:40596: EOF]]
stderrI2024/04/11 15:10:11 http: TLS handshake error from 172.16.16.6:40598: EOF]]
stderrϝI2024/04/11 15:10:55 http: TLS handshake error from 172.16.16.6:55830: EOF]]
stderrÐI2024/04/11 15:11:05 http: TLS handshake error from 172.16.16.6:54146: EOF]]
stderrߒI2024/04/11 15:11:05 http: TLS handshake error from 172.16.16.6:54162: EOF]]
stderrʷI2024/04/11 15:11:05 http: TLS handshake error from 172.16.16.6:54172: EOF]
stderr˷2024/04/11 15:11:05 http: TLS handshake error from 172.16.16.6:54168: read tcp 172.17.0.2:9001->172.16.16.6:54168: read: connection reset by peer]
stderrI2024/04/11 15:11:07 http: TLS handshake error from 172.16.16.6:54174: EOF]w
stderrcWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index1/size: no such file or directoryww
stderr cWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index1/size: no such file or directoryww
stderrcWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index2/size: no such file or directoryw
stderrWARNING: failed to determine memory area for node: open /host/sys/devices/system/node/node0/memory_failure/state: no such file or directory]
stderrI2024/04/11 15:11:14 http: TLS handshake error from 172.16.16.6:41600: EOF]]
stderrI2024/04/11 15:11:14 http: TLS handshake error from 172.16.16.6:41610: EOF]]
stderr꧹I2024/04/11 15:11:14 http: TLS handshake error from 172.16.16.6:41614: EOF]w
stderrcWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index1/size: no such file or directoryww
stderrcWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index1/size: no such file or directoryww
stderrcWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index2/size: no such file or directoryw
stderrWARNING: failed to determine memory area for node: open /host/sys/devices/system/node/node0/memory_failure/state: no such file or directory]
stderrI2024/04/11 15:11:29 http: TLS handshake error from 172.16.16.6:37482: EOF]w
stderrcWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index1/size: no such file or directoryww
stderrécWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index1/size: no such file or directoryww
stderrcWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index2/size: no such file or directoryw
stderrƫWARNING: failed to determine memory area for node: open /host/sys/devices/system/node/node0/memory_failure/state: no such file or directory]
stderrкI2024/04/11 15:11:56 http: TLS handshake error from 172.16.16.6:41524: EOF]]
stderrҤI2024/04/11 15:11:56 http: TLS handshake error from 172.16.16.6:41534: EOF]]
stderrI2024/04/11 15:11:56 http: TLS handshake error from 172.16.16.6:41548: EOF]]
stderrI2024/04/11 15:11:56 http: TLS handshake error from 172.16.16.6:41550: EOF]]
stderrI2024/04/11 15:32:18 http: TLS handshake error from 172.16.16.6:44942: EOF]]
stderrI2024/04/11 15:32:18 http: TLS handshake error from 172.16.16.6:44956: EOF]]
stderrػI2024/04/11 15:32:18 http: TLS handshake error from 172.16.16.6:44968: EOF]]
stderrԈI2024/04/11 15:32:27 http: TLS handshake error from 172.16.16.6:50406: EOF]w
stderr뉫cWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index1/size: no such file or directoryww
stderrcWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index1/size: no such file or directoryww
stderrcWARNING: open /host/sys/devices/system/node/node0/cpu0/cache/index2/size: no such file or directoryw
stderrᥭWARNING: failed to determine memory area for node: open /host/sys/devices/system/node/node0/memory_failure/state: no such file or directory]
stderr؊I2024/04/11 15:32:36 http: TLS handshake error from 172.16.16.6:36466: EOF]]
stderrӊڊI2024/04/11 15:32:36 http: TLS handshake error from 172.16.16.6:36474: EOF]]
stderrI2024/04/11 15:32:36 http: TLS handshake error from 172.16.16.6:36484: EOF]]
stderr I2024/04/11 15:32:45 http: TLS handshake error from 172.16.16.6:60244: EOF]]
stderrަI2024/04/11 15:32:45 http: TLS handshake error from 172.16.16.6:60250: EOF]
@jamescarppe I'm having the same problem as above.
Upgrading from ":latest" to ":sts" broke Portainer's connection to remote servers.
They appear as "Down". Well, they seem to be "Up" now, but two of the servers are getting an error of "Failure - Unable to retrieve volumes". And not all the containers are appearing. After some time, they're all "Down" again...
One of the servers is CentOS 7 and the other one is AlmaLinux 8. All users (whether working or not) have Docker 24.0.7.
Happens with both 2.20.1 and 2.20.0.
I tried to downgrade back to ":latest", but looks like downgrade is not possible:
The database schema version does not align with the server version. Please consider reverting to the previous server version or addressing the database migration issue.
(ok - I was able to downgrade by downloading a backup file, downgrade to ":latest" and upload the backup file)
I can confirm it, I cannot use the portainer-ee latest/lts version after I'm using portainer-ee:2.20.1.
The database schema version does not align with the server version. Please consider reverting to the previous server version or addressing the database migration issue.
Same issue ...
Same issue here. The Portainer Agent container is up and running, but Portainer is unable to connect to it. "Environment is unreachable"
Portainer Agent container log:
2024/05/10 07:39PM INF ./main.go:86 > agent running on Docker platform
2024/05/10 07:39PM WRN ./main.go:112 > unable to retrieve agent container IP address, using host flag instead | error="Error response from daemon: No such container: bb03d4935cd2" host_flag=0.0.0.0
2024/05/10 07:39PM INF registry/server.go:101 > starting registry credential server
2024/05/10 07:39PM INF http/server.go:99 > starting Agent API server | api_version=2.19.5 server_addr=0.0.0.0 server_port=9001 use_tls=true
2024/05/10 07:40PM INF ./main.go:86 > agent running on Docker platform
2024/05/10 07:40PM WRN ./main.go:112 > unable to retrieve agent container IP address, using host flag instead | error="Error response from daemon: No such container: bb03d4935cd2" host_flag=0.0.0.0
2024/05/10 07:40PM INF registry/server.go:101 > starting registry credential server
2024/05/10 07:40PM INF http/server.go:99 > starting Agent API server | api_version=2.19.5 server_addr=0.0.0.0 server_port=9001 use_tls=true
2024/05/10 07:41PM INF ./main.go:86 > agent running on Docker platform
2024/05/10 07:41PM WRN ./main.go:112 > unable to retrieve agent container IP address, using host flag instead | error="Error response from daemon: No such container: bb03d4935cd2" host_flag=0.0.0.0
2024/05/10 07:41PM INF registry/server.go:101 > starting registry credential server
2024/05/10 07:41PM INF http/server.go:99 > starting Agent API server | api_version=2.19.5 server_addr=0.0.0.0 server_port=9001 use_tls=true
After upgrading all the hosts to AlmaLinux 8 (some were on CentOS 7), and all to Docker 26.1.2, the latest version of portainer:sts
seems to work perfectly now.
@ghnp5 I can confirm it, it works with 2.20.3 perfectly.
Same here, Docker updated to 26.1.3, Portainer EE and agents to 2.20.3, everything back to normal.