Git Submodules Issue
Checklist:
- [x] I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- [x] I've included steps to reproduce the bug.
- [x] I've pasted the output of
argocd version.
Describe the bug
I discovered this when trying to use the argocd-image-updater with a project repository that uses git submodules, though the issue was made apparent from using the argocd-image-updater tool, it's not an issue with that tool specifically.
To give some background on what I was trying to accomplish:
I have main project repos (I call them MonoPoly repos) that act sort of as monorepos which include different project pieces as git submodules, such as my-app for the apps main source repo, image-builder for the repo that handles building the apps image, deployment which contains the deployment manifests, eg:
githublabcentral.svc/my-team/projects/my-app:
--
my-app/ (submodule --> githublabcentral.svc/my-team/apps/my-app)
image-builder/ (submodule --> githublabcentral.svc/my-team/image-builders/my-app)
deployment/ (submodule --> githublabcentral.svc/my-team/deployment-manifests/my-app)
.gitmodules
All these repos will have at least a main master and staging branches which get deployed via argo (along with other generated branches such as staging/my-new-feature).
Note that the .gitmodules uses the special branch = . setting which tells git to reference the submodules to match whatever the parent branch is. This just means the staging branch on projects/my-app will have all it's submodules pointed to each of their own staging branches, though this isn't part of the issue, the same problem would happen if you manually configured the branch setting in .gitmodules file in each branch.
Now that I've provided "what I'm trying to do" as background information, even though the issue doesn't apply specifically to that scenario, let me continue with more details...
When you add an application that points to a repository that uses kustomize*, git submodules and use parameter overrides (eg. kustomize.images are added (whether by argocd-image-updater, under the Parameters tab for an application in the UI or editing the manifest and adding a value such as kustomize.images) and you've already deployed another application (even when in a different Project or namespace) using that same repo, even on a different branch, an error is thrown:
Failed to load target state: failed to generate manifest for source 1 of 1: rpc error: code = Unknown desc = failed to initialize repository resources: rpc error: code = Internal desc = Failed to checkout FETCH_HEAD: `git submodule update --init --recursive` failed exit status 1: error: Your local changes to the following files would be overwritten by checkout: kustomization.yaml Please commit your changes or stash them before you switch branches. Aborting fatal: Unable to checkout 'abcdefabcdefabcdef12341234abcdefabcdef1' in submodule path 'deployment'
It seems like argo uses something the repo itself to decide where it clones/caches the files on deployment.
In most cases, this isn't an issue, as git won't complain when cloning a new repo/branch to the same directory and overwriting if there's no changes, however when a parameter is overwritten, argo seems to be modifying the kustomize.yaml or something in that cache directory, which causes git to complain as shown above when you try to add another app pointing to the same repo, regardless of the Project, branch or namespace.
For example, assuming the main repo and submodule repo only have a master branch and you've deployed my-app without auto-sync enabled, then you set a parameter on that app, then you update the parent repo so the submodule points at a different commit and then add a second app using that repo, even in a different project and namespace, the error happens.
To Reproduce
Example repo: https://gitlab.com/shaped.ca/argocd-issue/my-app
Deploy an app using that repo with the name my-app, sync policy set to automatic, branch set to HEAD as is default, and the path set to deployment.
Then edit the app in the UI and change the image tag to jammy-20240125 and save.
Then, deploy a second app using that same repo, using the name my-app-testing with the same settings except the branch set to staging (again, it's not the branch specifically, just that the submodule the app is referencing has a different commit SHA - otherwise when argo runs git submodule update --init --recursive it actually does something, if the submodule, regardless of branch, refers to the same commit, then git will see this and no-op).
You should already notice an issue in the UI, where it's no longer switching the bottom pane to kustomize, though sometimes this doesn't happen (presumably because I have 2 replicas for argocd-repo-server) and sometimes I can still add the app, though I will get the same error on the initial sync. On some occasions where I've been able to add it and get an error on the initial sync, after a while it does sync, but most times I can't even add it.
This even happened to me just now when trying to make the reproduction repo when I had deleted all apps referring to that repo; I had to restart the argocd-repo-server deployment to be able to create it again at all and in this case I don't think I had even set a parameter on the original app yet!
After further testing, I was unable to reproduce on a repo not using kustomize.
Expected behavior
It should work.
Screenshots
Version
argocd-server: v2.9.0+7e80f1e
Logs
argocd-server:
grpc.service=application.ApplicationService grpc.start_time="2024-02-14T13:43:49Z" grpc.time_ms=2123.75 span.kind=server system=grpc
time="2024-02-14T13:43:52Z" level=info msg="finished unary call with code InvalidArgument" error="rpc error: code = InvalidArgument desc = application spec for my-app-staging is invalid: InvalidSpecError: Unable to generate manifests in deployment: rpc error: code = Unknown desc = failed to initialize repository resources: rpc error: code = Internal desc = Failed to checkout FETCH_HEAD: `git submodule update --init --recursive` failed exit status 1: error: Your local changes to the following files would be overwritten by checkout:\n\tkustomization.yaml\nPlease commit your changes or stash them before you switch branches.\nAborting\nfatal: Unable to checkout '<commitSHA>' in submodule path 'deployment'" grpc.code=InvalidArgument grpc.method=Create grpc.service=application.ApplicationService grpc.start_time="2024-02-14T13:43:50Z" grpc.time_ms=2362.708 span.kind=server system=grpc
There are no applicable entries in the argocd-repo-server logs.
After further testing - it seems like perhaps it's only happening if the kustomize is patching the image under patches ..
Which, of course, is not really the proper way to do that, of course!
When using the proper images: in the kustomize.yaml, I can't seem to reproduce, even with other values being patched..?
Definitely a bit odd; I'm not sure why that one case causes the cache to have that issue with a submodule. Unfortunately I don't know argocd internals that well to know.
In reality, I could have always just pointed straight to the deployment repo instead of the main repo with the deployment submodule but it was making me scratch my head a bit. Otherwise, I was only using the patch method because I was modifying other values on the container spec and was just testing, I had always intended to use the images: override instead.
I'll leave this here, I suppose it's still technically an issue. If I figure out any more details or run into this without patching the image, I'll post back but otherwise I don't think this is a high priority issue; even without me discovering the patch weirdness, as I said I could've pointed directly to the submodule repo anyway as a workaround..
Weird; I just hit it again without patching images..!
No idea what's going on now.. Definitely a caching issue though!
edit: wonder if that was partly because of left over caches?
tried doing a hard refresh too whenever it seemed appropriate; one issue is not being able to do that unless the app exists..?
This is 100% causing me issues.
I can't even re-add an app with one of these repos after deleting all others related without getting the error from git saying it'll overwrite changes.
Is there a possibility to at least get some direction on this? No one else using submodules? Like, where the cache is or a way to clear it or anything?
I've tried restarting the argo-repo pods and the argo-server too and not a lot of luck but sometimes it randomly fixes itself too?
https://github.com/argoproj/argo-cd/issues/9645#issuecomment-1996618678
Might be related; added some other comments there.
This doesn't just affect sub modules though:
ALSO - I've had this happen with SEPARATE APPS in SEPARATE PROJECTS on SEPARATE NAMESPACES that point to the SAME REPO but DIFFERENT branches; so the cache folder is defined by the REPO itself, not APP or PROJECT, NAMESPACE, BRANCH?! Which means this could even be an issue for anyone using multiple branches on the same repo, submodules or not!
So, I currently have an Application deployed that's only using one branch even and have run into a similar issue.
In fact, that Application is currently targeting the deployment submodule repo directly but still has a branch specified, master in this case.
Previously I only ran into it when trying to deploy multiple Applications targeting different branches in the same repo.
The error from git in this case, mentioned before you switch branches so it seems like the issue also crops up when using overrides w/kustomize and having any branch specified other than HEAD, even if that branch is the default branch!
Any eyes on this? this completely breaks using argo-image-updater or using any overrides, even when manually creating the override file in a repo (like argo-image-updater does when in git mode)
As mentioned in my last post, this doesn't just affect when using submodules or apps deployed with multiple branches but even any branch besides HEAD including main or master or whatever your default is.
Edit: Some discussion on Slack today re: this, https://cloud-native.slack.com/archives/C01TSERG0KZ/p1712776528870569
Seems like potentially adding -f/--force to the related git commands might be a solution, barring anything I'm not aware of otherwise internally in Argo.. If so, it'd be a relatively simple/quick fix I'd think?