devspace icon indicating copy to clipboard operation
devspace copied to clipboard

File sync with arm64 build on Apple Silicon causes "too many open files" error

Open noizwaves opened this issue 4 years ago • 11 comments

What happened?

When running the arm64 build of devspace on an M1 Apple MacBook with our application (~13,000 files), the file sync will fail to start with an too many open files error. Raising the open file limit does not rectify the issue - when the limit is raised substantially, it continues to consume file descriptors until erroring again.

This error presents itself in our logs as:

[0:sync] Start syncing
[done] √ Sync started on /Users/username/workspace/app <-> . (Pod: spacename/app-web-76699667dd-6qr8m)
[0:sync] Error: Sync Error on /Users/username/workspace/app: error while traversing /Users/username/workspace/app/gems/some_gem/db/migrate: too many open files

Note: this behavior does not seem to be present when running the arm64 build under Rosetta on an M1 Apple MacBook. This is our current workaround to this issue.

What did you expect to happen instead? When running the arm64 build, the file sync should consume file descriptors in the same way it does with the amd64 build under Rosetta.

How can we reproduce the bug? (as minimally and precisely as possible)

  1. Obtain an Apple Silicon powered Apple computer
  2. Install the arm64 build of DevSpace 5.12
  3. Configure devspace for a project that has ~10,000 files to sync
  4. Run devspace dev

Local Environment:

  • DevSpace Version: devspace version 5.12.2
  • Operating System: mac
  • Deployment method: helm

Kubernetes Cluster:

  • Cloud Provider: aws
  • Kubernetes Version: [use kubectl version]

Anything else we need to know? Below are some screenshots of the output from watch -n 0.1 "lsof -n | grep devspace | awk '{ print \$2 \" \" \$1; }' | sort -rn | uniq -c | sort -rn" as we debugged this issue:

  • apple silicon, arm64 was taken after the first occurrence of an too many open files error: apple silicon, arm64
  • apple silicon, amd64 with Rosetta was taken after the file sync had successfully started: apple silicon, amd64 with Rosetta
  • intel, amd64 was taken after the file sync had successfully started: intel, amd64

Additionally, when raising the file limits using limit -n, sysctl kern.maxfilesperproc and sysctl kern.maxfiles, we able to get devspace to open ~100k files on the computer. It doesn't appear to be an issue with devspace being starved of open file handles.

/kind bug

noizwaves avatar May 15 '21 22:05 noizwaves

cc-ing @bbuchalter and @w0de who were involved in the debugging

noizwaves avatar May 15 '21 22:05 noizwaves

@noizwaves thanks for creating this issue! I'm still a little bit confused what could cause this issue to be honest, but after some digging this seems to me golang related itself. One thing we could try is to use CGO to compile the arm64 binary, which apparently helped in some cases with strange errors like this.

FabianKramm avatar May 16 '21 08:05 FabianKramm

If it's worth mentioning, devspaces is definitely not the only project having an issue with this. I was actually looking for alternatives to Skaffold as I encountered the same exact "too many files open" error on my M1 macbook this week. I believe they also use golang so that sounds like the right direction to look for issues!

uncvrd avatar Jun 13 '21 03:06 uncvrd

Reporting back a few days later to confirm the same scenario as the original post. I have a monorepo with several projects, when I npm install locally but tell sync to ignore the node_modules, it doesn't appear to do so as I receive a too many open files error. When I download the /usr/local/bin/devspace: Mach-O 64-bit executable x86_64 binary, it works as expected

uncvrd avatar Jun 16 '21 07:06 uncvrd

Still happens on my M1. Any updates? Rosetta workaround doesn't work for me - it only causes k8s cluster to crash with an error: Error gathering target pods for log streaming: pods by image name: list pods: the server was unable to return a response in the time allotted, but may still be processing the request (get pods)

evyros avatar Aug 01 '21 00:08 evyros

@FabianKramm @LukasGentele do you know if this issue has been resolved in Golang yet? I'd love to switch our M1 users over to the native binary and simplify our deployment of devspace.

noizwaves avatar Nov 29 '21 22:11 noizwaves

@noizwaves we are still investigating this issue, but it seems like binaries that were natively compiled on an M1 mac work fine and it seems to be an issue with cross compiling binaries from amd to arm in our release pipeline. We'll probably end up compiling the darwin arm64 binary manually on a M1 mac locally and then upload it separately as Github actions currently do not support arm macos platforms.

FabianKramm avatar Nov 30 '21 12:11 FabianKramm

@noizwaves @evyros @uncvrd Good news, we actually found the underlying problem and we can fix it. We also know why it works compiling DevSpace locally, but not with the released binary: the problem is that when using CGO_ENABLED=1 (which is true locally, but not if you are cross-compiling it in a pipeline) the sync event library we are using (https://github.com/rjeczalik/notify) is using FSEvents as file watching technology on darwin, while when using CGO_ENABLED=0 it is using kqueue. Apparently their kqueue implementation is broken on arm64, but not on amd64, which is why it works when self compiling on an arm64 system (uses FSEvents, which works), but not with the released binary (uses kqueue, broken).

We found out that we can enable cgo and cross-compile arm64 from amd64 in our pipeline to build DevSpace when using the macos-11 platform, which should fix this problem for all upcoming DevSpace versions starting with v5.18.0-beta.1, which already should not experience this bug anymore.

FabianKramm avatar Nov 30 '21 15:11 FabianKramm

Awesome, many thanks for digging into the issue @FabianKramm !

noizwaves avatar Nov 30 '21 16:11 noizwaves

This seems to be still an issue with Kubernetes running locally in docker.

  • Kind cluster
  • Kubernetes v1.20
  • MacOS 12

Tried versions of devspace CLI

  • 5.17 (arm)
  • 5.17 (amd)
  • 5.18-beta.1 (arm)
  • 5.18-beta.4 (arm)
  • 5.18-beta.4 (amd64)

plisy avatar Dec 14 '21 14:12 plisy

Building it with CGO_ENABLED=1 on m1 is still the only way around it.

brurucy avatar Mar 21 '22 13:03 brurucy