image-automation-controller
image-automation-controller copied to clipboard
HelmRelease file not updated
@stefanprodan As you know, I have been using your tools for a long time. My general experience is that the tools are rock solid.
Today a colleague reported that an image was not being updated. I inspected our cluster. The Image Automation controller had been running for 4 days. It had made updates as recently as last night. There were no error messages in the logs. In other words, from the perspective of the Image Automation controller, everything was running fine.
I looked for an ImagePolicy for the image in question. I found one. The policy had been updated to the new image hours earlier. The annotation to instruct the Image Automation controller was attached to a HelmRelease. That file had not been modified with the new image tag. However, that file had been updated by the image automation controller in the last 4 days.
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: mock-metadata-service
spec:
interval: 5m
chart:
spec:
chart: .
version: "0.0.4"
sourceRef:
kind: GitRepository
name: mock-metadata-service
namespace: shared
interval: 1m
values:
name: mock-metadata-service
imagePullSecrets:
- name: ecr-credentials-sync
images:
tag: 21.0811.1547.38-70e9909c-master # {"$imagepolicy": "shared:mock-metadata-service:tag"}
I restarted the Image Automation controller and it quickly updated the Gitlab repo to reflect the proper version of the Image tag.
So, I am left to conclude that either an error occurred that was not properly logged or there is another bug.
Could this be related to https://github.com/fluxcd/image-automation-controller/issues/209? have you seen any timeout logs?
@stefanprodan I think that this just happened again. Following is the data that I was able to collect.
I see no timeout in the logs.
Here is the complete log: image-automation-log.txt
Here is the current date: date.txt
Here is the image policy: imagepolicy.txt
Here is the source of the Helmrelease:
kind: HelmRelease
metadata:
name: mock-metadata-service
spec:
interval: 5m
chart:
spec:
chart: .
version: "0.0.4"
sourceRef:
kind: GitRepository
name: mock-metadata-service
namespace: shared
interval: 1m
values:
name: mock-metadata-service
imagePullSecrets:
- name: ecr-credentials-sync
images:
tag: 21.0810.1833.45-01134759-master # {"$imagepolicy": "shared:mock-metadata-service:tag"}
Here is the in cluster representation of the helm release:
What would happen if the controller ran on a node that was very low of memory? Could it cause this failure?
:wave: @derrickburns
If you set --log-level=debug on the controller deployment, the controller (in recent versions) will record much more about why it does or doesn't make any update. That might reveal if there's some subtle, or mistaken, reason it declines to commit the change you expected.
Hi, We are experiencing similar problems. After quite some time working seamlessly, applications stop being updated automatically by flux. New image is detected, but no changes are commited to workload repo and application is not updated on cluster.
After forced image-automation-controller restart changes are commited and pushed to repo and application is updated.
We've found no information in logs that could tell us anything about the cause and about the problem itself.
The image-automation controller version v0.21.0 introduces an experimental transport that fixes the issue in which the controller stops working in some specific scenarios.
The experimental transport needs to be opted-in by setting the environment variable EXPERIMENTAL_GIT_TRANSPORT to true in the controller's Deployment.
This will require a redeploy of all components so I would recommend doing so via flux bootstrap using the flux cli version v0.28.0 which will be released tomorrow.
Can you test it again with the experimental transport enabled and let us know how you get on please?
Closing this issue due to inactivity, but happy to reopen in case of reincidence whilst using the latest versions of the image automation controller.