for-win icon indicating copy to clipboard operation
for-win copied to clipboard

[Hyper V] Docker bind mount becomes unresponsive or very slow after a period of inactivity and container cannot be stopped

Open VizuaaLOG opened this issue 1 year ago • 43 comments

Description

A container with a bind mount configured will experience an issue where the mounted path in the container becomes unresponsive or very slow - seems to depend on the number of files from testing. Project with thousands of files I gave up waiting after about 10 - 15 minutes, project with one file sometimes restores after 5 - 10 minutes. But will repeat again after a period of inactivity.

The inactivity is just not causing a read/write operation within the container's mounted folder. For example, using your computer as normal but not refreshing a web application.

When a container starts to show this issue Docker Desktop is also unable to stop the container. Returning a 500 error, although I cannot find any specific logs. The only solution is to restart the whole Docker service.

Reproduce

  1. Clone a small repository with a reproducible project - https://github.com/VizuaaLOG/docker-windows-bug
  2. docker-compose up -d
  3. Load localhost and notice the phpinfo output
  4. Wait some time, it seems to be 30 - 40 minutes in my testing
  5. Try refreshing - notice the endless loading
  6. Also try ls in the container within the mounted path and the command will hang

Expected behavior

I would expect the mount to work as it would if no requests were submitted.

docker version

Client:
 Cloud integration: v1.0.35+desktop.5
 Version:           24.0.7
 API version:       1.43
 Go version:        go1.20.10
 Git commit:        afdd53b
 Built:             Thu Oct 26 09:08:44 2023
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Desktop 4.26.1 (131620)
 Engine:
  Version:          24.0.7
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.10
  Git commit:       311b9ff
  Built:            Thu Oct 26 09:08:02 2023
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.25
  GitCommit:        d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
 runc:
  Version:          1.1.10
  GitCommit:        v1.1.10-0-g18a0cb0
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    24.0.7
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.0-desktop.2
    Path:     C:\Program Files\Docker\cli-plugins\docker-buildx.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.23.3-desktop.2
    Path:     C:\Program Files\Docker\cli-plugins\docker-compose.exe
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-dev.exe
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.21
    Path:     C:\Program Files\Docker\cli-plugins\docker-extension.exe
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  0.1
    Path:     C:\Program Files\Docker\cli-plugins\docker-feedback.exe
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v0.1.0-beta.10
    Path:     C:\Program Files\Docker\cli-plugins\docker-init.exe
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-sbom.exe
  scan: Docker Scan (Docker Inc.)
    Version:  v0.26.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-scan.exe
  scout: Docker Scout (Docker Inc.)
    Version:  v1.2.0
    Path:     C:\Program Files\Docker\cli-plugins\docker-scout.exe

Server:
 Containers: 4
  Running: 1
  Paused: 0
  Stopped: 3
 Images: 15
 Server Version: 24.0.7
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
 runc version: v1.1.10-0-g18a0cb0
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.5.11-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 7.762GiB
 Name: docker-desktop
 ID: 513c014c-b01c-477c-ad0f-3c5107d9a310
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile

Diagnostics ID

43915152-3A05-4E58-B08E-9BA7A8DA7B0A/20240130225938

Additional Info

Tested on three different Windows computers.

  1. Windows 11 laptop running docker 4.26.1 using both a real project and a similar setup to the example one provided
  2. Windows 10 laptop with a fresh Windows install and then just Docker 4.26.1 and VSCode installed
  3. Windows 11 desktop running Docker 4.22.x (cannot remember the exact version) this worked after 2 hours of inactivity. Updating Docker to 4.26.1 I then experienced this issue

This issue does not seem to affect Docker volumes. Having two containers one using a bind mount and the other using a volume, only the container with the bind mount experienced the issue.

VizuaaLOG avatar Jan 30 '24 23:01 VizuaaLOG

To add some additional info to this. I have tested a few different versions on the windows 10 laptop and have found the following

  • 4.23.1 - works fine
  • 4.24.2 - works fine
  • 4.25.2 - works fine
  • 4.26.0 - broken
  • 4.26.1 - broken
  • 4.27.0 - broken

VizuaaLOG avatar Jan 31 '24 16:01 VizuaaLOG

Having a similar issue that started around the same time, been meaning to report this, but you beat me to it. Can't shut down/restart the containers, just have to restart docker. If I'm working continuously for hours, no problems at all, but if I leave it alone for a while, it goes unresponsive, and I have to restart docker desktop.

x0rsw1tch avatar Feb 05 '24 23:02 x0rsw1tch

Have same problem on Windows 10/11. Rollback to 4.25.2

nchizhov avatar Feb 12 '24 06:02 nchizhov

I am also seeing this issue with NodeJS Express/Flask Python/MariaDB/CRA React stack. Docker Desktop v4.27.2 (and saw it in v4.27.1 confirmed), Windows 10 with Hyper-V. I have to bring all the containers down (usually running the cli command more than once) and then bringing them back up with --force-recreate.

jennyfofenny avatar Feb 16 '24 03:02 jennyfofenny

Same problems. I think I upgraded from 4.24 to 4.26 and all sort of problems like this started happening. Somehow, disabling SBOM Indexing helped for a while and I didn't experience any hanging for some time but then it started again.

I guess I will be reverting back the versions, too, as it became a chore to restart docker all the time.

STotev avatar Feb 19 '24 11:02 STotev

Sadly 4.28.0 is still affected by this 😞

VizuaaLOG avatar Mar 01 '24 14:03 VizuaaLOG

Sounds like the same issue I reported in #13849 - I'm still on 4.25.2 because of it.

lexandera avatar Mar 14 '24 00:03 lexandera

I'm seeing the same issue on my Intel-based MacBook Pro with 4.27.0. I reverted to 4.25.2 and it now works fine.

jacobth-te avatar Apr 10 '24 15:04 jacobth-te

Same thing has been happening to me for a good while. Latest version of Docker running on Windows 10, only I'm using HyperV as WSL has been quite slow for me in most cases. Pretty much identical symptoms as OP. Containers running fine for some time and after a while they just become unresponsive - the actual services that are being consumed on the frontend start hanging and when I check specific containers, even though they appear to be running, upon further inspection it becomes clear they have stopped responding correctly. Usually the only reliable way to fix things is by restarting the whole Docker service.

This has been happening for a decent while now so I thought it was a hardware problem on my end but after finally deciding to actually look into it today and search online I found this issue, tried downgrading to 4.25.2 and everything has been working perfectly fine for the last few hours (I has two freezes in a matter of two hours early in the morning before the downgrade).

It does seem like whatever is happening has been introduced post 4.25.2 and has been dragged along with updates up to and including the most recent version (4.29.0 (145265) as of the time of writing this).

mtzonev avatar Apr 30 '24 15:04 mtzonev

I experienced the same issue, just downgraded to 4.25.2 so I can work without random freezes.

Myke79 avatar May 19 '24 08:05 Myke79

Same issue on macOS 13 (Ventura) with an Intel CPU.

OldSneerJaw avatar May 28 '24 16:05 OldSneerJaw

Same on linux redhat 8.9 on a bind mounted file with these option! The container doesn"t update the host file:

  • /usr/share/opensearch/DONT_DELETE_config_mail.ini:/config_mail.ini:consistent,ro
  • /usr/share/opensearch/DONT_DELETE_config_mail.ini:/config_mail.ini:ro

ENV:

  • Red Hat Enterprise Linux release 8.9 (Ootpa)
  • Docker Compose version v2.15.1
  • Docker version 26.1.3, build b72abbb

ggt avatar May 29 '24 11:05 ggt

Just tried out 4.31.0 in hopes that this problem might be fixed per #14060, I can say that this is not fixed, at least on Hyper-V.

Time to revert back to 4.25.2 🫠

EDIT: I have two containers, each have a mount. they were working fine for a couple hours, then I stepped out for lunch. Computer was awake the whole time. After coming back, having not touched the containers for about 45 minutes, they both became unresponsive. I did a refresh on pages in each container, and let it go to see if it would come back to life. It took around 3-4 minutes before the containers responded.

x0rsw1tch avatar Jun 07 '24 22:06 x0rsw1tch

same here 😢 6 month and none staff's investigation

BoGnY avatar Jun 10 '24 14:06 BoGnY

This task needs more attention from the developers: It's easily reproducable, but a sneaky one, I agree... :( For months I blame on my PC for these docker freezes, until I found this issue. Too bad, we have to stick on the last working version without freezes: 4.25.2 Using: Windows 10 + Hyper-V engine too

Myke79 avatar Jun 10 '24 18:06 Myke79

Today, I captured these two error messages I got when trying to restart the containers after leaving them idle (and being unresponsive). These errors are different than what I recall seeing on earlier affected versions. Again, no issues leaving these same containers running, even days at a time, 4.25.2.

Cannot stop Docker Compose application. Reason: compose [stop] exit status 1. Container *********** Stopping Container *********** Stopping Container *********** Error while Stopping Container *********** Error while Stopping Error response from daemon: cannot stop container: ***********: container *********** PID 889 is zombie and can not be killed. Use the --init option when creating containers to run an init inside the container that forwards signals and reaps processes
Cannot stop Docker Compose application. Reason: compose [stop] exit status 1. Container *********** Stopping Container *********** Stopping Container *********** Stopped Container *********** Error while Stopping request returned Bad Gateway for API route and version http://%2F%2F.%2Fpipe%2FdockerDesktopLinuxEngine/v1.45/containers/***********/stop, check if the server supports the requested API version

On the first event, I had to restart Docker Desktop, the 2nd time, I just shutdown the containers, and brought them back up. Neither of these use tinyinit, but that hasn't been a problem prior to upgrading.

I think I'm just gonna bite the bullet and downgrade to 4.25.2. It's only been two days, and I'm already done dealing with this stupid problem multiple times a day 🫠

x0rsw1tch avatar Jun 10 '24 22:06 x0rsw1tch

Same issue here in more than 20 pc's with different OS (Win 10/11) but same problem (when idle, only reset docker or wsl bring back to live).

This only works ok in vers 4.25.2 and priors. In new 4.31.1 are broken too.

AngelCifuentes avatar Jun 11 '24 06:06 AngelCifuentes

For developers, I've created two diagnostics ID, the first diagnostics ID 1794EA5F-3789-49B8-9261-182EA193F758/20240611162749 are created after freeze, and the second diagnostics ID 1794EA5F-3789-49B8-9261-182EA193F758/20240611163814 are created after freeze and after try containers restart (maybe contains some restart log error)

BoGnY avatar Jun 11 '24 16:06 BoGnY

I waited too long to start the Docker Image and i couldn't get it up

(HTTP code 502) unexpected -

then i restarted docker, and i got hit his with:

(HTTP code 500) server error - driver failed programming external connectivity on endpoint php81-apache (6186098aca2925cd83e8dc70b2045e092fe3cda96e3be70cf8ea059a8554e14c): Bind for 0.0.0.0:443 failed: port is already allocated

Also:

(HTTP code 500) server error - Cannot restart container b7ec8623ccb9f58594952064808ec9e458f41abc2997c46f24def031d036a03c: tried to kill container, but did not receive an exit event

Hanmac avatar Jun 13 '24 07:06 Hanmac

Is anyone aware of any workarounds or mitigations for this issue apart from the following?

  1. Downgrade Docker to 4.25.2 (Our company security teams do not allow this)
  2. Switch to WSL2 (Our company security teams do not allow this)
  3. Remove bind-mounts (All our existing development workflows and projects currently depend on bind-mounts)

These three options are the workarounds I've been able to surmise, but as none of them are available to us (and I suspect many other development teams) I wanted to make sure I'm not missing anything else.

ryan-jacobs avatar Jun 25 '24 02:06 ryan-jacobs

I've already downgraded, and haven't tested this. Perhaps one workaround would be to write a script to write/touch to a dummy file in the container's bind mount, every couple of minutes, to prevent it from "idling out". Before downgrading, I could code for hours on end without issues... That is, until I had to go AFK, then upon returning to discover the containers have gone zombie, and I have to restart docker, or wait a really long time for it to come back to life.

x0rsw1tch avatar Jun 25 '24 17:06 x0rsw1tch

@bsousaa can we have updates???

this bug is 1 and a half years old on WSL2 #13160 and over 6 months old on hyperv and no staff investigation in neither case

BoGnY avatar Jun 26 '24 08:06 BoGnY

@x0rsw1tch commented on Jun 25, 2024, 7:06 PM GMT+2:

Perhaps one workaround would be to write a script to write/touch to a dummy file in the container's bind mount, every couple of minutes, to prevent it from "idling out".

Already tested, I've created a shell script with an infinite touch loop... Don't work 😭 Containers are death 5 times only today in 8 working hours...

#!/usr/bin/env bash

echo "[INFO] Starting infinite touch loop."
while true; do
  sudo touch /var/tmp/infinite.loop
done &

BoGnY avatar Jun 26 '24 16:06 BoGnY

@BoGnY you could store the uptime inside your file, so see how long it would survive

Or you could try to use the loop to write your file into one of the windows mount points?

Hanmac avatar Jun 27 '24 09:06 Hanmac

What if the whole container goes to sleep, and this touch loop sleeps as well, and that's why this workaround isn't working. Another possible workaround would be a curl/wget call from the host os, that would call some URL every minutes or so, what you normally call from the browser while working on the project, to keep the whole thing alive?

Myke79 avatar Jun 27 '24 09:06 Myke79

@Hanmac commented on Jun 27, 2024, 11:09 AM GMT+2:

Or you could try to use the loop to write your file into one of the windows mount points?

@Myke79 commented on Jun 27, 2024, 11:20 AM GMT+2:

What if the whole container goes to sleep, and this touch loop sleeps as well, and that's why this workaround isn't working.
Another possible workaround would be a curl/wget call from the host os, that would call some URL every minutes or so, what you normally call from the browser while working on the project, to keep the whole thing alive?

yes I used wrong path.. I'm trying to touch a file in a mountpoint, if this not work I will try a curl call from host..

BoGnY avatar Jun 27 '24 09:06 BoGnY

Thanks @x0rsw1tch for the hint regarding a scripted read/touch operation. While certainly not a sustainable option, I did have a little luck extending operational time with this. What I noticed:

  1. Every mount point needs to be read/touched individually. We have projects with numerous bind-mounts and it looks like they each die independently, so artificially forcing activity on just one isn't enough.
  2. A simple read operation from the container side seems to be enough. I tried with just a simple "ls" operation on the mount root and that seemed to have the same impact as a touch or write operation.
  3. As @Myke79 noted, this hack will not survive a sleep operation on the host, which further limits mileage.

Specifically, I tried calling the following via a 1min cron job so as not to add any fragility maintaining a shell session:

#!/bin/bash

# Get the list of bind mounts that are specific to Docker
bind_mounts=$(cat /proc/mounts | grep '^grpcfuse')

# Loop over each bind mount
while IFS= read -r line; do
    # Extract the mount point (second field)
    mount_point=$(echo $line | awk '{print $2}')
    # Trigger a CRUD operation on the mount point to keep it alive.
    ls -al "$mount_point" > /dev/null 2>&1
done <<< "$bind_mounts"

I noticed each Docker bind mount listed in /proc/mounts started with "grpcfuse", which allowed for a generic loop like this, though that may be specific to our environments.

This definite helped, but in addition to being a hack, it does have quite notable limitations (e.g. when the host goes to sleep). I do still think this could be useful while we work with our security teams on more sustainable solutions, specifically switching to WSL2.

ryan-jacobs avatar Jul 01 '24 16:07 ryan-jacobs

FYI to downgrade get 4.25.1 and run: & '.\Docker Desktop Installer.exe' install --disable-version-check

https://stackoverflow.com/a/77224786/516748

If you don't restart/stop Docker for Windows first the installer will ask you to kill processes (containers) locked up by this very issue

kcd83 avatar Jul 02 '24 00:07 kcd83

same problem 4.32.0

flowerwOw0316 avatar Jul 10 '24 12:07 flowerwOw0316

Good morning,

Any update of this issue? It's very annoying neet to do multiple "restarts" / day or stick in a old vers (4.25.2)

Example:

  • Coding (all ok)
  • Go to one meet or any break 10-20 mins (no use PC at all, docker idle)
  • Return to PC, docker not respond, must restart it
  • Restart and all ok, coding again

Repeat this 5 - 10 times, all days, whole office colleges... T.T If need some stats, more info, images, mounts... anything, tell me and i'll put for help

Still waiting and thx in advance, Ángel.

AngelCifuentes avatar Jul 26 '24 08:07 AngelCifuentes