eksctl
eksctl copied to clipboard
Race condition when updating log retention policy
While using new "logRetentionInDays" field added in eksctl 0.73.0, we sometimes randomly observe the error during cluster creation:
[✔] configured CloudWatch logging for cluster "test-cluster" in "us-west-2" (enabled types: audit, authenticator, scheduler & disabled types: api, controllerManager)
[!] 1 error(s) occurred and cluster hasn't been created properly, you may wish to check CloudFormation console
[ℹ] to cleanup resources, run 'eksctl delete cluster --region=us-west-2 --name=test-cluster'
[✖] error updating log retention settings: ResourceNotFoundException: The specified log group does not exist.
The reproduction rate isn't too high, but high enough to be the problem (around 10%). It seems that there's a race condition in play here.
Interesting find. We do wait for the UpdateClusterConfig
operation to complete before issuing a call to logs:PutRetentionPolicy
but looks like that does not ensure the log group for the control plane is created. We'll investigate this and get back to you soon.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
Not stale, this still needs to be resolved.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Not stale, this still needs to be resolved.
@cPu1 what is required to resolve this. Are we waiting for anything from aws?
Running into this problem as well with v0.95.0