image-automation-controller icon indicating copy to clipboard operation
image-automation-controller copied to clipboard

Image update automation not committing the resolved version to git

Open sanjvij opened this issue 3 years ago • 30 comments

Hi team,

Thanks for your help so far. I am stuck at implementing a use case where by the image update automation policy is not applying the changes to the git.

When I run the command as below, I can see that flux was able to detect a new version in the registry but never committed the same to git. (base) sanj@Sanjs-Air app-cluster % flux get image policy staging NAME READY MESSAGE LATEST IMAGE
staging True Latest image tag for 'sanjvij01/getting-started' resolved to: v3.0.2 sanjvij01/getting-started:v3.0.2

(base) sanj@Sanjs-Air app-cluster % kubectl get deployment/getting-started-image -n staging -oyaml | grep 'image' name: getting-started-image selfLink: /apis/apps/v1/namespaces/staging/deployments/getting-started-image - image: sanjvij01/getting-started:v3.0.1 imagePullPolicy: IfNotPresent message: ReplicaSet "getting-started-image-554964548d" has successfully progressed.

my image update automation file looks like below. I have a feeling I have done something in which case feel free to point.

apiVersion: image.toolkit.fluxcd.io/v1alpha1 kind: ImageUpdateAutomation metadata: name: flux-system namespace: flux-system spec: checkout: branch: master gitRepositoryRef: name: flux-staging commit: authorEmail: [email protected] authorName: sanjvij messageTemplate: '{{range .Updated.Images}}{{println .}}{{end}}' interval: 1m0s push: branch: master update: path: ./ strategy: Setters

Let me know if you need me to provide any further info.

sanjvij avatar Apr 15 '21 13:04 sanjvij

I'm running into a similar issue. Is there a way to debug the image-automation-controller to get more info on why we're getting "no updates made"?

I've got two clusters:

  • 1.19 in EKS, looks like it's running v0.11.0 (PROD)
  • 1.20 as a K3s cluster, running v0.12.3. (TEST)

I may try to downgrade my TEST cluster to v0.11.0 to see if that lets things work.

rayterrill avatar Apr 24 '21 00:04 rayterrill

Downgraded cluster to v0.11.0, still not able to get image-automation-controller working correctly, and not sure where to start looking to understand why it isn't making an update.

rayterrill avatar Apr 29 '21 17:04 rayterrill

Update - Dug into the manifests for the flux components - looks like we can put the image-automation-controller in debug mode:

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app.kubernetes.io/instance: flux-system
    app.kubernetes.io/version: v0.11.0
    control-plane: controller
  name: image-automation-controller
  namespace: flux-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: image-automation-controller
  template:
    metadata:
      annotations:
        prometheus.io/port: "8080"
        prometheus.io/scrape: "true"
      labels:
        app: image-automation-controller
    spec:
      containers:
      - args:
        - --events-addr=http://notification-controller/
        - --watch-all-namespaces=true
        - --log-level=debug
        - --log-encoding=json
        - --enable-leader-election

rayterrill avatar Apr 29 '21 17:04 rayterrill

Getting:

no changes made in working directory; no commit

rayterrill avatar Apr 29 '21 17:04 rayterrill

Played around with the yaml for my definition - maybe something was goofed up with the imagepolicy comment? I eventually got it working but it would really be nice to have some additional mechanisms to figure out why this wasn't working.

rayterrill avatar Apr 29 '21 21:04 rayterrill

Hi @rayterrill, I just came across a similar issue today. Could it be the case that this issue is connected to a directory name, where the yaml file containing the imagePolicy reference is located? E.g. for me the update works when the imagePolicy reference is located in: deploy/overlays/int/trololo/patch.yaml on the other hand it is not working if its located in here: deploy/overlays/int/user-ui-config-service/patch.yaml

BR Robert

bobrossthepainter avatar May 04 '21 12:05 bobrossthepainter

I just found out, that the problem is indeed related to the directory where the file holding the imagePolicy ref is located. The alphabetically ascending last directory in a parent directory is ignored by the imagePolicy resolving algorithm.

My fix was adding another dir zzz with a .gitkeep file inside and it magically worked. :D

bobrossthepainter avatar May 04 '21 14:05 bobrossthepainter

Potentially, yes. I did end up moving my stuff around in directories - so maybe that helped resolve it.

Really wish there was a way for DEBUG mode to give enough detail to determine why it's not working so we can self-resolve these kinds of issues.

rayterrill avatar May 04 '21 14:05 rayterrill

@rayterrill What sort of debug output would have helped you?

squaremo avatar Jun 02 '21 14:06 squaremo

Ideally some way to understand why it didn't work - something like "nothing to update" or even better something like "image would be updated but setters didn't match anything" - something to indicate the reconciliation loop found work to do but something in the config prevented it from being written. I believe my problem was indeed that my folder structure and setters definition were not aligned (backed into this by moving things around until I discovered that was the problem).

rayterrill avatar Jun 02 '21 19:06 rayterrill

Please correct me if I'm wrong.

The GitRepository, ImageRepository, ImagePolicy, and ImageUpdateAutomation must be in the same namespace. Then you need to add the image policy marker. eg {"$imagepolicy": "<policy-namespace>:<policy-name>"}. Meaning when you have multiple namespace, the solutions are

  1. Create GitRepository, ImageRepository, ImagePolicy, ImageUpdateAutomation in flux-system namespace. Then the image policy marker points to flux-system:<policy_name>
  2. Create GitRepository in every namespace that want to have ImageUpdateAutomation.

I don't know if it is expected by the design or not. I already tested the first approach and it works well. This is the fact that i found.

  • ImageUpdateAutomation can only refer to GitRepository in same namespace (1) (2)
  • ImageUpdateAutomation only get ImagePolicy in same namespace (1)

melodiez14 avatar Jun 19 '21 04:06 melodiez14

Ideally some way to understand why it didn't work - something like "nothing to update" or even better something like "image would be updated but setters didn't match anything" - something to indicate the reconciliation loop found work to do but something in the config prevented it from being written.

The controller argument --log-level=debug now results in lots of tracing output: #190. (This will be moved to --log-level=trace at some point soon).

squaremo avatar Jul 14 '21 16:07 squaremo

I started to discuss this in #180 - linking these threads so they are more discoverable and can perhaps be closed together, with a docs improvement.

kingdonb avatar Aug 20 '21 08:08 kingdonb

Hi @rayterrill, I just came across a similar issue today. Could it be the case that this issue is connected to a directory name, where the yaml file containing the imagePolicy reference is located? E.g. for me the update works when the imagePolicy reference is located in: deploy/overlays/int/trololo/patch.yaml on the other hand it is not working if its located in here: deploy/overlays/int/user-ui-config-service/patch.yaml

BR Robert

That was the problem for me.. It worked moving the files into the right directory

narenderramireddy avatar Dec 02 '21 18:12 narenderramireddy

Hello, so what directory do you recommend to put these files in? I have them like clusters/eks/apps/app_name/app_name-registry.yaml It doesn't seem to work after a while.

However if I install the image automation controller again using: flux install --components-extra=image-reflector-controller,image-automation-controller

It starts working right away, but again stops after a while, so I am not sure it is solely a directory issue.

raress96 avatar Dec 08 '21 14:12 raress96

This most recent comment may be describing the same issue as

  • https://github.com/fluxcd/image-automation-controller/issues/286

These may be duplicate issues, or you may be reporting the other issue... we are investigating it from there, if it's the same.

We've heard reports from a number of folks that image automation stops working after a while, and the curative action suggested that seems to be resolving it is a restart of image-automation-controller. That would likely be accomplished by reinstalling with flux install --components-extra... as you mentioned @raress96 – are you still experiencing this?

kingdonb avatar Jan 06 '22 17:01 kingdonb

@kingdonb Hey, it seems to work for now, but we didn't have many images built lately because of the holidays. Not sure if it's going to stop working after a while. Will also follow the other issue for updates.

raress96 avatar Jan 07 '22 06:01 raress96

@raress96 #209 and #282 are probably better issues to follow. There are a lot of reports of this issue and it has been tricky from what I understand to reproduce reliably. It appears to happen when there is a connectivity or availability issue with GitHub (and then the issue persists until the controller restarts, from what I've heard based on the reports we got.)

kingdonb avatar Jan 07 '22 15:01 kingdonb

@kingdonb It reproduced again for me.

So what I did is that I was setting up a new app with a new ImageRepository and a deployment in which I had for the image filed the following urlwhatever...:1 # {"$imagepolicy": "flux-system:new-app"}. The version 1 of that image didn't actually exist, so I pushed this an no app was created. After an image was pushed, it had the tag/version 3, and the image in the deployment was successfully updated to urlwhatever...:3 # {"$imagepolicy": "flux-system:new-app"}.

But then version 4 of the app was pushed to the image repository, and the image was not updated, and I had to run flux install --components-extra=image-reflector-controller,image-automation-controller, after which the image was again updated.

Pretty weird. I will also follow those other issues and maybe my feedback helps you debug this.

raress96 avatar Jan 07 '22 15:01 raress96

Hey, forgot to mention one thing that might be important: if I have version 3 in the k8s files, and I manually change the version to 4 in a Deployment, the image automation controller puts back version 3, so it seems to still be running but not fetching the correct version maybe.

raress96 avatar Jan 12 '22 12:01 raress96

Facing same issue. Using flux stack 0.30.2 on EKS v1.21.5-eks-bc4871b,

on describing imageupdateautomation object, no updates made whereas imagepolicy and imagerepository is working fine. [image at the end]

all image* objects are in same namespace

$ flux get images all --all-namespaces
NAMESPACE      	NAME                       	LAST SCAN                	SUSPENDED	READY	MESSAGE                       
dev-xxxx	imagerepository/xxxx	2022-05-07T03:20:26+05:30	False    	True 	successful scan, found 8 tags	

NAMESPACE      	NAME                       	LATEST IMAGE                                             	READY	MESSAGE                                                                                       
dev-xxxx	imagepolicy/xxxx-dev	docker.io/xxxx/xxxx:edge-561db2e-1651841847	True 	Latest image tag for 'docker.io/xxxx/xxxx' resolved to: edge-561db2e-1651841847	

NAMESPACE  	NAME                           	LAST RUN                 	SUSPENDED	READY	MESSAGE         
flux-system	imageupdateautomation/k8s-infra	2022-05-07T03:20:22+05:30	False    	True 	no updates made	
image

pratikbin avatar May 06 '22 21:05 pratikbin

Same issue here:

FluxCD version: 0.30.2

Output of flux check

► checking prerequisites
✔ Kubernetes 1.20.12+vmware.1 >=1.20.6-0
► checking controllers
✔ helm-controller: deployment ready
► ghcr.io/fluxcd/helm-controller:v0.18.2
✔ image-automation-controller: deployment ready
► ghcr.io/fluxcd/image-automation-controller:v0.21.3
✔ image-reflector-controller: deployment ready
► ghcr.io/fluxcd/image-reflector-controller:v0.17.1
✔ kustomize-controller: deployment ready
► ghcr.io/fluxcd/kustomize-controller:v0.22.3
✔ notification-controller: deployment ready
► ghcr.io/fluxcd/notification-controller:v0.23.2
✔ source-controller: deployment ready
► ghcr.io/fluxcd/source-controller:v0.22.5
✔ all checks passed

Output of flux version

flux: v0.30.2
helm-controller: v0.18.2
image-automation-controller: v0.21.3
image-reflector-controller: v0.17.1
kustomize-controller: v0.22.3
notification-controller: v0.23.2
source-controller: v0.22.5

ImageUpdateAutomation not working, the image-automation-controller's logs says:

{
   "level":"error",
   "ts":"2022-05-17T08:57:04.660Z",
   "logger":"controller.imageupdateautomation",
   "msg":"Reconciler error",
   "reconciler group":"image.toolkit.fluxcd.io",
   "reconciler kind":"ImageUpdateAutomation",
   "name":"redacted",
   "namespace":"flux-system",
   "error":"unable to clone 'ssh://git@redacted/redacted/redacted': SSH could not read data: Error waiting on socket"
}

It looks like cannot write back the changes. Thanks for help.

kallaics avatar May 17 '22 09:05 kallaics

@kallaics have you tried enabling managed transport yet? This is something we added recently that focuses on improves Git connections.

You just need to get your controller pod to have the environment variable EXPERIMENTAL_GIT_TRANSPORT=true, and that should suffice to enable it. From the next release this will no longer be required as it will be enabled by default.

More information: https://fluxcd.io/docs/components/source/gitrepositories/#experimental-managed-transport-for-libgit2-git-implementation

pjbgf avatar May 17 '22 10:05 pjbgf

My question removed.

I will write a feedback, when the new images are coming.

kallaics avatar May 17 '22 12:05 kallaics

The first update caused an issue:

{ 
  "level":"error",
  "ts":"2022-05-17T12:28:19.441Z",
  "logger":"controller.imageupdateautomation",
  "msg":"Reconciler error",
  "reconciler group":"image.toolkit.fluxcd.io",
  "reconciler kind":"ImageUpdateAutomation",
  "name":"redacted",
  "namespace":"flux-system",
  "error":"unable to clone 'ssh://git@redacted/redacted/redacted': ssh: unexpected packet in response to channel open: <nil>"
}

Additional information the URL doesn't contains .git at the end of the URL and the endpoint is Gitlab. @pjbgf Do you have any idea maybe? Thanks for your time and help.

kallaics avatar May 17 '22 12:05 kallaics

@kallaics I think the problem you are experiencing is slightly different than the one reported on this thread. So I created a new issue for it: https://github.com/fluxcd/image-automation-controller/issues/365

pjbgf avatar May 20 '22 08:05 pjbgf

My failed deployment had exactly the same symptoms ("no changes made in working directory; no commit", new image detected but PR to deploy/update image tag was not created). The reason was: I use to work on Linux boxes but this time I was working on a $§"&%$ Windows machine (the %@§ "Company Policy") and the yml file was encoded as UTF-16 by the so called "text editor" making git diff identify the yml file as a binary file (which obviously it wasn't) and ignoring changes to the file. Maybe FluxCD uses git diff to detect changes (IDK) and it was ignoring the changes to such yml. Changing the encoding to UTF-8 did the trick. Just in case, also check that the file uses LF instead of CRLF

nagarciah avatar May 30 '23 13:05 nagarciah

Just in case it helps anyone else, in my case it was because the ImageUpdateAutomation had the wrong update path - duh! Simple one but easy to gloss over since there are no errors per se

enricozammitlon avatar Nov 21 '23 03:11 enricozammitlon

Please correct me if I'm wrong.

The GitRepository, ImageRepository, ImagePolicy, and ImageUpdateAutomation must be in the same namespace. Then you need to add the image policy marker. eg {"$imagepolicy": "<policy-namespace>:<policy-name>"}. Meaning when you have multiple namespace, the solutions are

  1. Create GitRepository, ImageRepository, ImagePolicy, ImageUpdateAutomation in flux-system namespace. Then the image policy marker points to flux-system:<policy_name>
  2. Create GitRepository in every namespace that want to have ImageUpdateAutomation.

I don't know if it is expected by the design or not. I already tested the first approach and it works well. This is the fact that i found.

  • ImageUpdateAutomation can only refer to GitRepository in same namespace (1) (2)
  • ImageUpdateAutomation only get ImagePolicy in same namespace (1)

ale900522 avatar Jan 04 '24 11:01 ale900522

Please check this issue I have a problem with image update automation https://github.com/fluxcd/image-automation-controller/issues/621#issue-2065471221

ale900522 avatar Jan 04 '24 11:01 ale900522