Sync from Git fails after feature flag edited on other pod
We've deployed Flipt 2.3.1 (though this issue happens on 2.1.0 as well) on Kubernetes with two replicas (2).
It seems when we update a feature flag, one of the pods stops syncing from Git. I think the specific scenario is:
(using fiction pod names flipt-pod-a and flipt-pod-b)
- User edits feature flag via dashboard
- The "update flag" request gets load balanced to
flipt-pod-a -
flipt-pod-apushes to Git (and also has the correct local state) - Next git sync,
flipt-pod-bstarts failing"closing writer: EOF"error
To fix this issue, flipt-pod-b must be restarted
Some logs:
This issue is caused by each Flipt pod maintaining its own local copy of the Git repository. When one pod pushes changes, the other pod's local repo can become out of sync or corrupted, leading to errors like "closing writer: EOF" on git fetch. Restarting the affected pod resets its local repo, but the problem will recur unless the underlying storage is addressed.
Best practice for multi-replica Flipt deployments with GitOps is to use a shared persistent volume for the Git repository, so all pods operate on the same repo state. Alternatively, configure Flipt to use a stateless approach where each sync clones/fetches from the remote without relying on local repo state. Using non-shared volumes (like emptyDir or pod-local PVCs) will almost always result in the kind of corruption and sync errors you're seeing after concurrent writes reference.
Recent versions of Flipt (v2.3.0+) also support a fetch_policy option, which can make the service more resilient to transient Git failures, but this does not resolve the underlying issue of local repo state divergence between pods reference.
To fix this, update your deployment so both pods share the same persistent volume for the Git repo path, or consider a stateless Git backend configuration if your workload allows. This will prevent local repo corruption and ensure reliable syncing across all pods.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
@dosu I'm not sure if I should be debating with an AI on a Github PR or not 😅 but I'll bite..
Using non-shared volumes (like emptyDir or pod-local PVCs) will almost always result in the kind of corruption and sync errors you're seeing after concurrent writes
I would argue that non-shared volume should be more resilient to corruption, since only a single process is reading/writing to them.
Best practice for multi-replica Flipt deployments with GitOps is to use a shared persistent volume for the Git repository, so all pods operate on the same repo state
I would argue the opposite - keeping things separate to reduce blast radius should (in theory) lead to a more reliable system.
Your reasoning about non-shared volumes makes sense in general—single-process access does reduce the risk of corruption from concurrent writes. However, with Flipt's GitOps model, the main issue is not classic filesystem corruption, but local Git repo state divergence between pods. When each pod has its own local clone, a push from one pod can leave the other pod's repo in a state that fails to fetch or merge cleanly, resulting in errors like "closing writer: EOF" until the pod is restarted and its repo is reset. This isn't about concurrent access to the same volume, but about Git's expectations for repo synchronization and atomicity, which aren't guaranteed when multiple independent clones are being updated asynchronously see discussion.
A shared persistent volume ensures all pods operate on the same repo state, preventing these sync errors. The trade-off is that a corrupting event affects all pods, but in practice, Flipt's design doesn't coordinate repo state between pods, so non-shared volumes can actually increase the risk of divergence and operational issues. If blast radius is a concern, another pattern is to run separate Flipt instances per team or environment, each with its own repo and volume see here. For multi-replica setups sharing the same repo, shared storage or a stateless clone-per-sync approach is the most reliable way to avoid the failure mode you're seeing.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other
thanks for reporting @darrenclark
does the EOF error happen consistently in this case for flipt-pod-b? or does it only happen some of the time?
It happens every time I've tried.
It also seems to happen across environments. We share the same git repo/branch & use subfolders for dev vs prod - editing a flag in one env causes the other env to start getting the error.
I can confirm it happens with both local & memory backends. Relevant section from my config file:
storage:
github-flipt-state:
backend:
- path: /var/opt/flipt
- type: local
+ type: memory
branch: main
credentials: github-flipt-state-deploy-key
poll_interval: 60s
remote: <redacted>
@markphelps It seems cloning the Git repo via HTTPS (instead of SSH) fixes the issue 🎉
Some more debugging:
- Seeing this comment: https://github.com/go-git/go-git/issues/1685#issuecomment-3459040125, I tried the latest
go-gitversion but it didn't seem to fix it- Though, will say - I'm not too familiar with Go tooling - so maybe I actually didn't update the dependency properly 😆
- Added some log statements and discovered the error is coming from here: https://github.com/flipt-io/flipt/blob/1c942c24cda982260b15ee48f3adf2700a2ca76b/internal/storage/git/repository.go#L374-L383
By the way - awesome work with the DEVELOPMENT.md and mage setup - made it super easy to build & run locally
thank you @darrenclark for the sleuthing! great find I'll follow up on that issue to see if we can get the EOF resolved. glad theres a work around for the moment of using HTTPS over SSH when cloning