source-controller icon indicating copy to clipboard operation
source-controller copied to clipboard

Problem with /tmp file

Open monsad opened this issue 8 months ago • 5 comments

Describe the bug Flux fills up /tmp

To Reproduce Flux copies the repository at certain time intervals but does not delete files from the tmp directory, causing it to become full.

Expected behavior Fluxcd should probably clean-up after itself?

Logs no space left on device"

Flux version: 2.4.0

monsad avatar May 07 '25 11:05 monsad

Fluxcd should probably clean-up after itself?

It does so probably something else is blocking it on the node. Look at source-controller logs for any errors.

stefanprodan avatar May 07 '25 11:05 stefanprodan

No, I checked source-controller logs and I found only 'info' level logs, any errors. So I looks that /tmp is not clean up

monsad avatar May 12 '25 07:05 monsad

We noticed the same issue. We run the source controller in kubernetes with the /tmp mounted on an emtyDir volume (see spec in this chart).

As far as I can tell every time we run flux reconcile source git mygitrepo a new folder is created in /tmp that looks like /tmp/gitrepository-flux-system-mygitrepo-someid. This folder exists for a brief period (I assume some checksums are compared) and then it gets removed.

The problem is that this removal doesn't always happen. I can't consistently reproduce it but I see some leftover folders that were never deleted. In our case it eventually leads to increased memory usage and the container gets OOMKilled.

I checked the logs but all I have is "info" level messages and no indication of a problem whatsoever.

$ flux  version 
flux: v2.5.1
distribution: flux-2.5.1
helm-controller: v1.2.0
image-automation-controller: v0.40.0
image-reflector-controller: v0.34.0
kustomize-controller: v1.5.1
notification-controller: v1.5.0
source-controller: v1.5.0

acondrat avatar May 22 '25 09:05 acondrat

I can't explain how would the tmp cleanup fail without having the error logged:

https://github.com/fluxcd/source-controller/blob/4aa31dcc21fa570122d91678ab6352d050481374/internal/controller/gitrepository_controller.go#L279-L293

No matter what happens during the reconciliation, we remove the tmp dir and if it fails, we log an error.

stefanprodan avatar May 24 '25 07:05 stefanprodan

I suspect the cleanup fails if the container gets OOMKilled in the middle of it. After this happens a few times and more garbage is accumulated in tmp the container constantly gets OOMKilled on startup. The workaround for us was to delete the pod so both /tmp and /data (emptyDir) are wiped. We also allocated more memory and it no longer seems to happen.

acondrat avatar Jun 03 '25 12:06 acondrat