for-win
for-win copied to clipboard
[Hyper V] Docker bind mount becomes unresponsive or very slow after a period of inactivity and container cannot be stopped
Description
A container with a bind mount configured will experience an issue where the mounted path in the container becomes unresponsive or very slow - seems to depend on the number of files from testing. Project with thousands of files I gave up waiting after about 10 - 15 minutes, project with one file sometimes restores after 5 - 10 minutes. But will repeat again after a period of inactivity.
The inactivity is just not causing a read/write operation within the container's mounted folder. For example, using your computer as normal but not refreshing a web application.
When a container starts to show this issue Docker Desktop is also unable to stop the container. Returning a 500 error, although I cannot find any specific logs. The only solution is to restart the whole Docker service.
Reproduce
- Clone a small repository with a reproducible project - https://github.com/VizuaaLOG/docker-windows-bug
-
docker-compose up -d
- Load
localhost
and notice the phpinfo output - Wait some time, it seems to be 30 - 40 minutes in my testing
- Try refreshing - notice the endless loading
- Also try
ls
in the container within the mounted path and the command will hang
Expected behavior
I would expect the mount to work as it would if no requests were submitted.
docker version
Client:
Cloud integration: v1.0.35+desktop.5
Version: 24.0.7
API version: 1.43
Go version: go1.20.10
Git commit: afdd53b
Built: Thu Oct 26 09:08:44 2023
OS/Arch: windows/amd64
Context: default
Server: Docker Desktop 4.26.1 (131620)
Engine:
Version: 24.0.7
API version: 1.43 (minimum version 1.12)
Go version: go1.20.10
Git commit: 311b9ff
Built: Thu Oct 26 09:08:02 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.25
GitCommit: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
runc:
Version: 1.1.10
GitCommit: v1.1.10-0-g18a0cb0
docker-init:
Version: 0.19.0
GitCommit: de40ad0
docker info
Client:
Version: 24.0.7
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.12.0-desktop.2
Path: C:\Program Files\Docker\cli-plugins\docker-buildx.exe
compose: Docker Compose (Docker Inc.)
Version: v2.23.3-desktop.2
Path: C:\Program Files\Docker\cli-plugins\docker-compose.exe
dev: Docker Dev Environments (Docker Inc.)
Version: v0.1.0
Path: C:\Program Files\Docker\cli-plugins\docker-dev.exe
extension: Manages Docker extensions (Docker Inc.)
Version: v0.2.21
Path: C:\Program Files\Docker\cli-plugins\docker-extension.exe
feedback: Provide feedback, right in your terminal! (Docker Inc.)
Version: 0.1
Path: C:\Program Files\Docker\cli-plugins\docker-feedback.exe
init: Creates Docker-related starter files for your project (Docker Inc.)
Version: v0.1.0-beta.10
Path: C:\Program Files\Docker\cli-plugins\docker-init.exe
sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
Version: 0.6.0
Path: C:\Program Files\Docker\cli-plugins\docker-sbom.exe
scan: Docker Scan (Docker Inc.)
Version: v0.26.0
Path: C:\Program Files\Docker\cli-plugins\docker-scan.exe
scout: Docker Scout (Docker Inc.)
Version: v1.2.0
Path: C:\Program Files\Docker\cli-plugins\docker-scout.exe
Server:
Containers: 4
Running: 1
Paused: 0
Stopped: 3
Images: 15
Server Version: 24.0.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
runc version: v1.1.10-0-g18a0cb0
init version: de40ad0
Security Options:
seccomp
Profile: unconfined
cgroupns
Kernel Version: 6.5.11-linuxkit
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 7.762GiB
Name: docker-desktop
ID: 513c014c-b01c-477c-ad0f-3c5107d9a310
Docker Root Dir: /var/lib/docker
Debug Mode: false
HTTP Proxy: http.docker.internal:3128
HTTPS Proxy: http.docker.internal:3128
No Proxy: hubproxy.docker.internal
Experimental: false
Insecure Registries:
hubproxy.docker.internal:5555
127.0.0.0/8
Live Restore Enabled: false
WARNING: daemon is not using the default seccomp profile
Diagnostics ID
43915152-3A05-4E58-B08E-9BA7A8DA7B0A/20240130225938
Additional Info
Tested on three different Windows computers.
- Windows 11 laptop running docker 4.26.1 using both a real project and a similar setup to the example one provided
- Windows 10 laptop with a fresh Windows install and then just Docker 4.26.1 and VSCode installed
- Windows 11 desktop running Docker 4.22.x (cannot remember the exact version) this worked after 2 hours of inactivity. Updating Docker to 4.26.1 I then experienced this issue
This issue does not seem to affect Docker volumes. Having two containers one using a bind mount and the other using a volume, only the container with the bind mount experienced the issue.
To add some additional info to this. I have tested a few different versions on the windows 10 laptop and have found the following
- 4.23.1 - works fine
- 4.24.2 - works fine
- 4.25.2 - works fine
- 4.26.0 - broken
- 4.26.1 - broken
- 4.27.0 - broken
Having a similar issue that started around the same time, been meaning to report this, but you beat me to it. Can't shut down/restart the containers, just have to restart docker. If I'm working continuously for hours, no problems at all, but if I leave it alone for a while, it goes unresponsive, and I have to restart docker desktop.
Have same problem on Windows 10/11. Rollback to 4.25.2
I am also seeing this issue with NodeJS Express/Flask Python/MariaDB/CRA React stack. Docker Desktop v4.27.2 (and saw it in v4.27.1 confirmed), Windows 10 with Hyper-V. I have to bring all the containers down (usually running the cli command more than once) and then bringing them back up with --force-recreate.
Same problems. I think I upgraded from 4.24 to 4.26 and all sort of problems like this started happening. Somehow, disabling SBOM Indexing helped for a while and I didn't experience any hanging for some time but then it started again.
I guess I will be reverting back the versions, too, as it became a chore to restart docker all the time.
Sadly 4.28.0 is still affected by this 😞
Sounds like the same issue I reported in #13849 - I'm still on 4.25.2 because of it.
I'm seeing the same issue on my Intel-based MacBook Pro with 4.27.0. I reverted to 4.25.2 and it now works fine.
Same thing has been happening to me for a good while. Latest version of Docker running on Windows 10, only I'm using HyperV as WSL has been quite slow for me in most cases. Pretty much identical symptoms as OP. Containers running fine for some time and after a while they just become unresponsive - the actual services that are being consumed on the frontend start hanging and when I check specific containers, even though they appear to be running, upon further inspection it becomes clear they have stopped responding correctly. Usually the only reliable way to fix things is by restarting the whole Docker service.
This has been happening for a decent while now so I thought it was a hardware problem on my end but after finally deciding to actually look into it today and search online I found this issue, tried downgrading to 4.25.2 and everything has been working perfectly fine for the last few hours (I has two freezes in a matter of two hours early in the morning before the downgrade).
It does seem like whatever is happening has been introduced post 4.25.2 and has been dragged along with updates up to and including the most recent version (4.29.0 (145265) as of the time of writing this).
I experienced the same issue, just downgraded to 4.25.2 so I can work without random freezes.
Same issue on macOS 13 (Ventura) with an Intel CPU.
Same on linux redhat 8.9 on a bind mounted file with these option! The container doesn"t update the host file:
- /usr/share/opensearch/DONT_DELETE_config_mail.ini:/config_mail.ini:consistent,ro
- /usr/share/opensearch/DONT_DELETE_config_mail.ini:/config_mail.ini:ro
ENV:
- Red Hat Enterprise Linux release 8.9 (Ootpa)
- Docker Compose version v2.15.1
- Docker version 26.1.3, build b72abbb
Just tried out 4.31.0 in hopes that this problem might be fixed per #14060, I can say that this is not fixed, at least on Hyper-V.
Time to revert back to 4.25.2 🫠
EDIT: I have two containers, each have a mount. they were working fine for a couple hours, then I stepped out for lunch. Computer was awake the whole time. After coming back, having not touched the containers for about 45 minutes, they both became unresponsive. I did a refresh on pages in each container, and let it go to see if it would come back to life. It took around 3-4 minutes before the containers responded.
same here 😢 6 month and none staff's investigation
This task needs more attention from the developers: It's easily reproducable, but a sneaky one, I agree... :( For months I blame on my PC for these docker freezes, until I found this issue. Too bad, we have to stick on the last working version without freezes: 4.25.2 Using: Windows 10 + Hyper-V engine too
Today, I captured these two error messages I got when trying to restart the containers after leaving them idle (and being unresponsive). These errors are different than what I recall seeing on earlier affected versions. Again, no issues leaving these same containers running, even days at a time, 4.25.2
.
Cannot stop Docker Compose application. Reason: compose [stop] exit status 1. Container *********** Stopping Container *********** Stopping Container *********** Error while Stopping Container *********** Error while Stopping Error response from daemon: cannot stop container: ***********: container *********** PID 889 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes
Cannot stop Docker Compose application. Reason: compose [stop] exit status 1. Container *********** Stopping Container *********** Stopping Container *********** Stopped Container *********** Error while Stopping request returned Bad Gateway for API route and version http://%2F%2F.%2Fpipe%2FdockerDesktopLinuxEngine/v1.45/containers/***********/stop, check if the server supports the requested API version
On the first event, I had to restart Docker Desktop, the 2nd time, I just shutdown the containers, and brought them back up. Neither of these use tinyinit
, but that hasn't been a problem prior to upgrading.
I think I'm just gonna bite the bullet and downgrade to 4.25.2. It's only been two days, and I'm already done dealing with this stupid problem multiple times a day 🫠
Same issue here in more than 20 pc's with different OS (Win 10/11) but same problem (when idle, only reset docker or wsl bring back to live).
This only works ok in vers 4.25.2 and priors. In new 4.31.1 are broken too.
For developers, I've created two diagnostics ID, the first diagnostics ID 1794EA5F-3789-49B8-9261-182EA193F758/20240611162749
are created after freeze, and the second diagnostics ID 1794EA5F-3789-49B8-9261-182EA193F758/20240611163814
are created after freeze and after try containers restart (maybe contains some restart log error)
I waited too long to start the Docker Image and i couldn't get it up
(HTTP code 502) unexpected -
then i restarted docker, and i got hit his with:
(HTTP code 500) server error - driver failed programming external connectivity on endpoint php81-apache (6186098aca2925cd83e8dc70b2045e092fe3cda96e3be70cf8ea059a8554e14c): Bind for 0.0.0.0:443 failed: port is already allocated
Also:
(HTTP code 500) server error - Cannot restart container b7ec8623ccb9f58594952064808ec9e458f41abc2997c46f24def031d036a03c: tried to kill container, but did not receive an exit event
Is anyone aware of any workarounds or mitigations for this issue apart from the following?
- Downgrade Docker to 4.25.2 (Our company security teams do not allow this)
- Switch to WSL2 (Our company security teams do not allow this)
- Remove bind-mounts (All our existing development workflows and projects currently depend on bind-mounts)
These three options are the workarounds I've been able to surmise, but as none of them are available to us (and I suspect many other development teams) I wanted to make sure I'm not missing anything else.
I've already downgraded, and haven't tested this. Perhaps one workaround would be to write a script to write/touch to a dummy file in the container's bind mount, every couple of minutes, to prevent it from "idling out". Before downgrading, I could code for hours on end without issues... That is, until I had to go AFK, then upon returning to discover the containers have gone zombie, and I have to restart docker, or wait a really long time for it to come back to life.
@bsousaa can we have updates???
this bug is 1 and a half years old on WSL2 #13160 and over 6 months old on hyperv and no staff investigation in neither case
@x0rsw1tch commented on Jun 25, 2024, 7:06 PM GMT+2:
Perhaps one workaround would be to write a script to write/touch to a dummy file in the container's bind mount, every couple of minutes, to prevent it from "idling out".
Already tested, I've created a shell script with an infinite touch loop... Don't work 😭 Containers are death 5 times only today in 8 working hours...
#!/usr/bin/env bash
echo "[INFO] Starting infinite touch loop."
while true; do
sudo touch /var/tmp/infinite.loop
done &
@BoGnY you could store the uptime inside your file, so see how long it would survive
Or you could try to use the loop to write your file into one of the windows mount points?
What if the whole container goes to sleep, and this touch loop sleeps as well, and that's why this workaround isn't working. Another possible workaround would be a curl/wget call from the host os, that would call some URL every minutes or so, what you normally call from the browser while working on the project, to keep the whole thing alive?
@Hanmac commented on Jun 27, 2024, 11:09 AM GMT+2:
Or you could try to use the loop to write your file into one of the windows mount points?
@Myke79 commented on Jun 27, 2024, 11:20 AM GMT+2:
What if the whole container goes to sleep, and this touch loop sleeps as well, and that's why this workaround isn't working.
Another possible workaround would be a curl/wget call from the host os, that would call some URL every minutes or so, what you normally call from the browser while working on the project, to keep the whole thing alive?
yes I used wrong path.. I'm trying to touch a file in a mountpoint, if this not work I will try a curl call from host..
Thanks @x0rsw1tch for the hint regarding a scripted read/touch operation. While certainly not a sustainable option, I did have a little luck extending operational time with this. What I noticed:
- Every mount point needs to be read/touched individually. We have projects with numerous bind-mounts and it looks like they each die independently, so artificially forcing activity on just one isn't enough.
- A simple read operation from the container side seems to be enough. I tried with just a simple "ls" operation on the mount root and that seemed to have the same impact as a touch or write operation.
- As @Myke79 noted, this hack will not survive a sleep operation on the host, which further limits mileage.
Specifically, I tried calling the following via a 1min cron job so as not to add any fragility maintaining a shell session:
#!/bin/bash
# Get the list of bind mounts that are specific to Docker
bind_mounts=$(cat /proc/mounts | grep '^grpcfuse')
# Loop over each bind mount
while IFS= read -r line; do
# Extract the mount point (second field)
mount_point=$(echo $line | awk '{print $2}')
# Trigger a CRUD operation on the mount point to keep it alive.
ls -al "$mount_point" > /dev/null 2>&1
done <<< "$bind_mounts"
I noticed each Docker bind mount listed in /proc/mounts started with "grpcfuse", which allowed for a generic loop like this, though that may be specific to our environments.
This definite helped, but in addition to being a hack, it does have quite notable limitations (e.g. when the host goes to sleep). I do still think this could be useful while we work with our security teams on more sustainable solutions, specifically switching to WSL2.
FYI to downgrade get 4.25.1 and run: & '.\Docker Desktop Installer.exe' install --disable-version-check
https://stackoverflow.com/a/77224786/516748
If you don't restart/stop Docker for Windows first the installer will ask you to kill processes (containers) locked up by this very issue
same problem 4.32.0
Good morning,
Any update of this issue? It's very annoying neet to do multiple "restarts" / day or stick in a old vers (4.25.2)
Example:
- Coding (all ok)
- Go to one meet or any break 10-20 mins (no use PC at all, docker idle)
- Return to PC, docker not respond, must restart it
- Restart and all ok, coding again
Repeat this 5 - 10 times, all days, whole office colleges... T.T If need some stats, more info, images, mounts... anything, tell me and i'll put for help
Still waiting and thx in advance, Ángel.