devspace
devspace copied to clipboard
File sync with arm64 build on Apple Silicon causes "too many open files" error
What happened?
When running the arm64 build of devspace on an M1 Apple MacBook with our application (~13,000 files), the file sync will fail to start with an too many open files error. Raising the open file limit does not rectify the issue - when the limit is raised substantially, it continues to consume file descriptors until erroring again.
This error presents itself in our logs as:
[0:sync] Start syncing
[done] √ Sync started on /Users/username/workspace/app <-> . (Pod: spacename/app-web-76699667dd-6qr8m)
[0:sync] Error: Sync Error on /Users/username/workspace/app: error while traversing /Users/username/workspace/app/gems/some_gem/db/migrate: too many open files
Note: this behavior does not seem to be present when running the arm64 build under Rosetta on an M1 Apple MacBook. This is our current workaround to this issue.
What did you expect to happen instead?
When running the arm64 build, the file sync should consume file descriptors in the same way it does with the amd64 build under Rosetta.
How can we reproduce the bug? (as minimally and precisely as possible)
- Obtain an Apple Silicon powered Apple computer
- Install the
arm64build of DevSpace 5.12 - Configure devspace for a project that has ~10,000 files to sync
- Run
devspace dev
Local Environment:
- DevSpace Version:
devspace version 5.12.2 - Operating System: mac
- Deployment method: helm
Kubernetes Cluster:
- Cloud Provider: aws
- Kubernetes Version: [use
kubectl version]
Anything else we need to know?
Below are some screenshots of the output from watch -n 0.1 "lsof -n | grep devspace | awk '{ print \$2 \" \" \$1; }' | sort -rn | uniq -c | sort -rn" as we debugged this issue:
apple silicon, arm64was taken after the first occurrence of antoo many open fileserror:
apple silicon, amd64 with Rosettawas taken after the file sync had successfully started:
intel, amd64was taken after the file sync had successfully started:
Additionally, when raising the file limits using limit -n, sysctl kern.maxfilesperproc and sysctl kern.maxfiles, we able to get devspace to open ~100k files on the computer. It doesn't appear to be an issue with devspace being starved of open file handles.
/kind bug
cc-ing @bbuchalter and @w0de who were involved in the debugging
@noizwaves thanks for creating this issue! I'm still a little bit confused what could cause this issue to be honest, but after some digging this seems to me golang related itself. One thing we could try is to use CGO to compile the arm64 binary, which apparently helped in some cases with strange errors like this.
If it's worth mentioning, devspaces is definitely not the only project having an issue with this. I was actually looking for alternatives to Skaffold as I encountered the same exact "too many files open" error on my M1 macbook this week. I believe they also use golang so that sounds like the right direction to look for issues!
Reporting back a few days later to confirm the same scenario as the original post. I have a monorepo with several projects, when I npm install locally but tell sync to ignore the node_modules, it doesn't appear to do so as I receive a too many open files error. When I download the /usr/local/bin/devspace: Mach-O 64-bit executable x86_64 binary, it works as expected
Still happens on my M1. Any updates?
Rosetta workaround doesn't work for me - it only causes k8s cluster to crash with an error:
Error gathering target pods for log streaming: pods by image name: list pods: the server was unable to return a response in the time allotted, but may still be processing the request (get pods)
@FabianKramm @LukasGentele do you know if this issue has been resolved in Golang yet? I'd love to switch our M1 users over to the native binary and simplify our deployment of devspace.
@noizwaves we are still investigating this issue, but it seems like binaries that were natively compiled on an M1 mac work fine and it seems to be an issue with cross compiling binaries from amd to arm in our release pipeline. We'll probably end up compiling the darwin arm64 binary manually on a M1 mac locally and then upload it separately as Github actions currently do not support arm macos platforms.
@noizwaves @evyros @uncvrd Good news, we actually found the underlying problem and we can fix it. We also know why it works compiling DevSpace locally, but not with the released binary: the problem is that when using CGO_ENABLED=1 (which is true locally, but not if you are cross-compiling it in a pipeline) the sync event library we are using (https://github.com/rjeczalik/notify) is using FSEvents as file watching technology on darwin, while when using CGO_ENABLED=0 it is using kqueue. Apparently their kqueue implementation is broken on arm64, but not on amd64, which is why it works when self compiling on an arm64 system (uses FSEvents, which works), but not with the released binary (uses kqueue, broken).
We found out that we can enable cgo and cross-compile arm64 from amd64 in our pipeline to build DevSpace when using the macos-11 platform, which should fix this problem for all upcoming DevSpace versions starting with v5.18.0-beta.1, which already should not experience this bug anymore.
Awesome, many thanks for digging into the issue @FabianKramm !
This seems to be still an issue with Kubernetes running locally in docker.
- Kind cluster
- Kubernetes v1.20
- MacOS 12
Tried versions of devspace CLI
- 5.17 (arm)
- 5.17 (amd)
- 5.18-beta.1 (arm)
- 5.18-beta.4 (arm)
- 5.18-beta.4 (amd64)
Building it with CGO_ENABLED=1 on m1 is still the only way around it.