for-mac icon indicating copy to clipboard operation
for-mac copied to clipboard

Docker not releasing files not in use

Open Smith8154 opened this issue 2 years ago • 19 comments

  • [x] I have tried with the latest version of Docker Desktop
  • [x] I have tried disabling enabled experimental features
  • [x] I have uploaded Diagnostics
  • Diagnostics ID: 771B4D3A-1C15-4188-9364-D05FADEAE701/20230328212305

Expected behavior

Docker should release the file after it is no longer needed.

Actual behavior

Docker appears to be holding on to files even after the container accessing the file is stopped. The only way to release the file is to restart the Docker engine.

I am passing a network share through to my Plex docker container, but after a short time of the container running, I begin to see these issues on my macOS host: fts_read: Too many open files. I have increased the open file limit by following this guide. When I check the file limits using launchctl limit maxfiles, this is the output: maxfiles 524288 524288. Using Activity Monitor to check what files the Virtual Machine Service has open, this is what I see when no containers have been started:

/
/System/Library/Frameworks/Virtualization.framework/Versions/A/XPCServices/com.apple.Virtualization.VirtualMachine.xpc/Contents/MacOS/com.apple.Virtualization.VirtualMachine
/Library/Preferences/Logging/.plist-cache.fB3OjRRy
/usr/share/icu/icudt70l.dat
/private/var/db/timezone/tz/2022g.1.0/icutz/icutz44l.dat
/dev/null
/dev/null
/dev/null
/Applications/Docker.app/Contents/Resources/linuxkit/kernel
/Applications/Docker.app/Contents/Resources/linuxkit/initrd.img
->0x4756f2816747024e
->0x9366f745c4a5cead
/Users/wsmith/Library/Containers/com.docker.docker/Data/vms/0/data/Docker.raw
/Users
/Volumes
/private
/private/tmp
/private/var/folders
->0xdf417ba25c1bf191
->0xdf417ba25c1c1a31
->0xdf417ba25c1b82c1
->(none)
/Users
->(none)
/Volumes
->(none)
/private
->(none)
/private/tmp
->(none)
/private/var/folders
->0xdf417ba25c1b85e1

After starting my Plex container for a few minutes and then stopping it, I see that the Virtual Machine Service has 1,010 files opened, with all of the opened files being the Plex configuration files and media files on the network volume, despite no containers running. Below is a snippet of lines 855-906 of the open files. Again, no container are running. The only way to release the lock on these files is to restart the Docker service.

/Volumes/plex-data/Movies/Guardians of the Galaxy/Guardians of the Galaxy.mp4
/Volumes/plex-data/Movies/Ready Player One/Ready Player One.mkv
/Volumes/plex-data/Movies/Iron Man 3/Iron Man 3.mp4
/Volumes/plex-data/Movies/The Dark Knight Rises/The Dark Knight Rises.mp4
/Volumes/plex-data/Movies/Futurama Benders Game/Futurama Benders Game.mkv
/Volumes/plex-data/Movies/Now You See Me 2/Now You See Me 2.mkv
/Volumes/plex-data/Movies/Futurama Into The Wild Green Yonder/Futurama Into The Wild Green Yonder.mkv
/Volumes/plex-data/Movies/Star Wars_ Revenge of the Sith/Star Wars_ Revenge of the Sith.mp4
/Volumes/plex-data/Movies/Toy Story 3/Toy Story 3.mkv
/Volumes/plex-data/Movies/X-Men_ Days of Future Past/X-Men_ Days of Future Past.mp4
/Volumes/plex-data/Movies/X-Men_ The Last Stand/X-Men_ The Last Stand.mp4
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Scanners
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Cache/CloudAccessV2.dat
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Cache/CloudUsersV2.dat
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Cache/CloudUsersSubscriptionsV2.dat
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Plug-in Support/Data/tv.plex.agents.movie
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Plug-in Support/Data/tv.plex.agents.movie/DataItems
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.1
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.5
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.4
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.3
/Users/wsmith/Docker/plexconfig/Library/Application Support/Plex Media Server/Logs/PMS Plugin Logs/tv.plex.agents.movie.log.2

Information

  • macOS Version: 13.2.1
  • Intel chip or Apple chip: Apple M2
  • Docker Desktop Version: 4.17.0

Output of /Applications/Docker.app/Contents/MacOS/com.docker.diagnose check

[2023-03-28T21:27:09.866482000Z][com.docker.diagnose][I] set path configuration to OnHost
Starting diagnostics

[PASS] DD0027: is there available disk space on the host?
[PASS] DD0028: is there available VM disk space?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0013: is the $PATH ok?
[PASS] DD0003: is the Docker CLI working?
[PASS] DD0038: is the connection to Docker working?
[PASS] DD0014: are the backend processes running?
[PASS] DD0007: is the backend responding?
[PASS] DD0008: is the native API responding?
[PASS] DD0009: is the vpnkit API responding?
[PASS] DD0010: is the Docker API proxy responding?
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0033: does the host have Internet access?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0032: do Docker networks overlap with host IPs?
segment 2023/03/28 17:27:13 ERROR: sending request - Post "https://api.segment.io/v1/batch": dial tcp [::]:443: connect: connection refused
segment 2023/03/28 17:27:13 ERROR: 1 messages dropped because they failed to be sent and the client was closed
No fatal errors detected.

Steps to reproduce the behavior

  1. Pass a volume through to a container.
  2. Start the container, and access files from the volume inside the container.
  3. Stop the container and check Activity Monitor open files for the Virtual Machine Service.

Smith8154 avatar Mar 28 '23 21:03 Smith8154

Information

  • macOS Version: 13.4
  • Intel chip or Apple chip: Apple M2
  • Docker Desktop Version: v4.20.1

I'm seeing the same behavior here.

After several days of normal dev work inside Docker, things start to fail because the host runs out of file handles. Even native macOS apps start crashing.

Comparing the output of lsof -Pn inside the Docker VM (using this) and in the macOS host, one can see there are tens of thousands of files opened in the host by the Virtual Machine Service that are not opened anymore by the Docker VM.

martinml avatar Jun 12 '23 12:06 martinml

As a workaround, I just found that using gRPC FUSE doesn't trigger this behavior. It's only with VirtioFS when files remain open by the Virtual Machine Service.

martinml avatar Jun 12 '23 15:06 martinml

Docker Desktop 4.21.1 (which now uses VirtioFS as default) shows the same behavior:

  1. Do some file-heavy work with Docker containers in a directory shared with the host. For example, npm install with a shared node_modules.
  2. Stop and delete the containers. docker ps -a shows 0 containers.
  3. Use Sloth to see that the Virtual Machine Service keeps a handle to every file opened in step 1, and it will until the Docker VM is restarted.

martinml avatar Jul 10 '23 11:07 martinml

This has been hitting me too with VirtioFS enabled. Minimal case for recreation just involves touching or creating loads of files:

→ lsof +c0 -n | awk '{print $1}' | sort | uniq -c | grep com.apple.Virtualization
36 com.apple.Virtualization.Virtua

→ mkdir -p testfiles && docker run -v./testfiles:/testfiles --rm -it ubuntu bash
root@8eb8a09639f2:/# seq 1 100000 | split -l 1 -a 5 -d - testfiles/file
split: testfiles/file57256: Too many open files in system
root@8eb8a09639f2:/# exit
exit

→ lsof +c0 -n | awk '{print $1}' | sort | uniq -c | grep com.apple.Virtualization
51569 com.apple.Virtualization.Virtua

This results in lots of very random broken behaviour on the host.

mattmacleod avatar Jul 13 '23 15:07 mattmacleod

I've gotten reports that this is a problem in 4.22.1 and 4.23.0 as well.

BHSPitMonkey avatar Sep 15 '23 02:09 BHSPitMonkey

Makes VirtioFS practically unusable and actually negatively impacts the host machine after some time.

bwalendz avatar Sep 15 '23 14:09 bwalendz

Just for those who stumple upon this: Current workaround is to switch from VirtioFS to gRPC FUSE

Screenshot 2023-12-11 at 11 34 19 AM

ucyo avatar Dec 11 '23 10:12 ucyo

I've submitted feedback to Apple and reported it to Apple support. https://developer.apple.com/forums/thread/741572

ryancurrah avatar Feb 21 '24 15:02 ryancurrah

Reporting in with Docker 25.0.3 on M1 Pro Mac with Sonoma 14.2.1. This is still a problem. Anything on this from the Docker Mac maintainers?

nem75 avatar Feb 23 '24 13:02 nem75

Same issue with [email protected]

Macbook M1 Pro Sonoma 14.3.1

bmmass avatar Mar 05 '24 12:03 bmmass

I'm facing the same problem with
m1 max/ mac os 14.4.1 (23E224)
docker version 4.30.0 (149282) with VirtioFS settings

lawxen avatar May 11 '24 08:05 lawxen

Docker 4.32.0 on m1 max macbook pro still has this problem

lawxen avatar Jul 10 '24 05:07 lawxen

Hi, I'm on

  • docker v4.32.0 (157355)
  • uname -a : Darwin MacBook-Pro.local 23.5.0 Darwin Kernel Version 23.5.0: Wed May 1 20:17:33 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T6031 arm64

Still having the issue. Switched to gRPC. Solves the problem but is way slower.

Hardware Overview:

      Model Name: MacBook Pro
      Model Identifier: Mac15,11
      Model Number: MRW33ZE/A
      Chip: Apple M3 Max
      Total Number of Cores: 14 (10 performance and 4 efficiency)
      Memory: 36 GB
      System Firmware Version: 10151.121.1
      OS Loader Version: 10151.121.1

chris-miaskowski avatar Jul 15 '24 12:07 chris-miaskowski

Theres only one way to fix this at the moment. Take your IT dollars and switch to Linux desktops, it's what we are doing. Docker on Linux does not suffer from this issue and you don't need Docker Desktop to boot! If we stop giving Docker and Apple our money they will eventually listen and fix Docker on Mac once and for all.

ryancurrah avatar Jul 15 '24 14:07 ryancurrah

Pretty sure this is not a problem of Docker but of VirtioFS. In combination with the comically low default file descriptor limit in MacOS.

If you don't want to switch your whole dev platform just because of this issue you can always up the file limit manually, e.g.

sudo launchctl limit maxfiles 65536 1048576

Been running with this for nearly half a year now without any problems.

Needs System Integrity Protection to be disabled though, so maybe not everyone's cup of tea.

To persist you can edit the values in /Library/LaunchDaemons/limit.maxfiles.plist.

You can check the currently effective limit with launchctl limit maxfiles.

nem75 avatar Jul 15 '24 15:07 nem75

That is a bandaid at best, and depending on what you are running, this will only buy you a bit of time before you run into the limit once again. In my case, upping the file limit took it from breaking within 2 minutes, to breaking in about 10 minutes. Not saying it's not worth pointing out, but this really needs to be addressed by the Docker team. At this point, I have given up hope that the Docker team cares about this issue at all, considering they haven't replied to this issue since it was opened over a year ago.

Smith8154 avatar Jul 15 '24 16:07 Smith8154

Of course it's a workaround, but it's running stable for me for months with multiple heavy yarn/npm operations in Docker volumes daily.

And of course having a real solution would be preferable. Until that happens we can give up, use bandaids or even sledgehammers (like using a whole different platform altogether). Endless possibilities. 😁

nem75 avatar Jul 15 '24 16:07 nem75

It's not Dockers problem but have they been using their relationship with Apple to fix it? Has it been a topic of conversation on any of their meetings? They are charging us for this software that doesn't work well they should at least try to work with Apple to fix it.

ryancurrah avatar Jul 15 '24 18:07 ryancurrah

@bsousaa any updates on this issue?

chris-miaskowski avatar Jul 22 '24 16:07 chris-miaskowski

still...happening...

markedwards avatar Oct 18 '24 07:10 markedwards

Yep. Makes using VirtioFS impossible. Really sucks.

jason-pollock avatar Oct 18 '24 15:10 jason-pollock

Still happening. Docker Desktop 4.39.0 (184744) on a Mac M4. I'm using docker VMM (virtioFS) and I've upped by limits both for the OS and for containers themselves. This only delays the time until my containers run into a fatal crash due to too many open file descriptors. Has anyone find any kind of a workaround? Should I try using a different VM within docker like GRPC fuse? If so, how big of a performance hit is this for I/o operations?

Seems like an issue this mission critical should not still be an open thread literally years later...

Shlawpers avatar Apr 02 '25 17:04 Shlawpers

Chiming in that our team is having this issue on M1/2/3 Macs as well. EMFILE errors during parcel or vite builds on NPM projects

0x647262 avatar May 09 '25 21:05 0x647262

For me its also happending:

  • M4, Sequia 15.5 (24F74)
  • Docker Desktop
    • Version 4.38.0 (181591)
    • Engine: 27.5.1
    • Compose: v2.32.4-desktop.1
    • Credential Helper: v0.8.2
    • Kubernetes: v1.31.4

Switching from VirtioFS to gRPC FUSE did also the trick for me.

maximilianreimer avatar May 16 '25 13:05 maximilianreimer

Sadly using gRPC FUSE eventually crashes the vm.

vjanelle avatar Jun 06 '25 05:06 vjanelle

Sadly using gRPC FUSE eventually crashes the vm.

If possible, can you kindly elaborate on this? So far for me this problem has been resolved by switching to gRPC FUSE, but I'm worried if there is a longer-term issue that might pop-up...

SultanOrazbayev avatar Aug 15 '25 08:08 SultanOrazbayev

@SultanOrazbayev it's not consistent across all use cases - I think something consistent was using emulation instead of arm64 native code though. It's been a bit.

Otherwise it was just something about a function of time.

vjanelle avatar Aug 15 '25 18:08 vjanelle