TerraformLayer "Error getting last Result"
Hi!
All my TerraformLayer objects have lastResult: Error getting last Result.
In web UI I can view logs but state is "OutOfSync" even if there is no changes in TF plan.
I'm using burrito image tag 10ebe5bfcd2ec9b899485865067dc871ded84ba7 on all burrito-system components.
Also, I use a MinIO datastore.
There's a line in log that says failed to get plan for layer qpc-bastion: could not get plan, there's an issue with the storage backend but it appears early in layer reconciliation...
When runner pod succeded I can see logs on MinIO storage:
Did you get same behaviour recently?
Here's the logs:
Sep 24, 2024 @ 12:00:22.973;controllers;level=info msg="getting Linked TerraformRepository to layer qpc-bastion"
Sep 24, 2024 @ 12:00:22.973;controllers;level=info msg="layer qpc-bastion has an outdated plan, creating a new run"
Sep 24, 2024 @ 12:00:22.973;controllers;level=info msg="starting reconciliation for layer burrito-qpc-config-terraform/qpc-bastion ..."
Sep 24, 2024 @ 12:00:22.989;controllers;level=debug msg="Failed to get last Result" logger=events object="{\"kind\":\"TerraformLayer\",\"namespace\":\"burrito-qpc-config-terraform\",\"name\":\"qpc-bastion\",\"uid\":\"ae7ce0bc-b3ea-48aa-a894-b1dd903f15d6\",\"apiVersion\":\"config.terraform.padok.cloud/v1alpha1\",\"resourceVersion\":\"1056649421\"}" reason=Reconciliation type=Normal
Sep 24, 2024 @ 12:00:22.989;controllers;level=error msg="failed to get plan for layer qpc-bastion: could not get plan, there's an issue with the storage backend"
Sep 24, 2024 @ 12:00:22.989;datastore;level=info bytes_in= bytes_out=24 error="<nil>" latency=6.667784ms method=GET remote_ip=10.34.12.119 service_account="system:serviceaccount:burrito-system:burrito-controllers" status=400 uri="/api/plans?attempt=&format=short&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run="
Sep 24, 2024 @ 12:00:22.999;controllers;level=debug msg="Created TerraformRun for Plan action" logger=events object="{\"kind\":\"TerraformLayer\",\"namespace\":\"burrito-qpc-config-terraform\",\"name\":\"qpc-bastion\",\"uid\":\"ae7ce0bc-b3ea-48aa-a894-b1dd903f15d6\",\"apiVersion\":\"config.terraform.padok.cloud/v1alpha1\",\"resourceVersion\":\"1056649421\"}" reason=Reconciliation type=Normal
Sep 24, 2024 @ 12:00:22.999;controllers;level=info msg="getting linked TerraformLayer to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:22.999;controllers;level=info msg="getting linked TerraformRepository to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:22.999;controllers;level=info msg="run qpc-bastion-plan-zsf8p is in initial state"
Sep 24, 2024 @ 12:00:22.999;controllers;level=info msg="starting reconciliation for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p ..."
Sep 24, 2024 @ 12:00:23.012;controllers;level=info msg="finished reconciliation cycle for layer burrito-qpc-config-terraform/qpc-bastion"
Sep 24, 2024 @ 12:00:23.012;controllers;level=info msg="no runs to delete for layer qpc-bastion"
Sep 24, 2024 @ 12:00:23.027;controllers;level=info msg="CA certificate is available in namespace burrito-qpc-config-terraform"
Sep 24, 2024 @ 12:00:23.052;controllers;level=debug msg="Successfully created pod qpc-bastion-plan-742x5 for initial run" logger=events object="{\"kind\":\"TerraformRun\",\"namespace\":\"burrito-qpc-config-terraform\",\"name\":\"qpc-bastion-plan-zsf8p\",\"uid\":\"88d4c110-8167-4f9c-a467-af79ac708e94\",\"apiVersion\":\"config.terraform.padok.cloud/v1alpha1\",\"resourceVersion\":\"1056649425\"}" reason=Run type=Normal
Sep 24, 2024 @ 12:00:23.052;controllers;level=info msg="pod qpc-bastion-plan-742x5 not found, ignoring..."
Sep 24, 2024 @ 12:00:23.070;controllers;level=info msg="finished reconciliation cycle for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="getting linked TerraformLayer to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="getting linked TerraformRepository to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="pod qpc-bastion-plan-742x5 is not in a terminal state, ignoring..."
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="run qpc-bastion-plan-zsf8p is running"
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="starting reconciliation for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p ..."
Sep 24, 2024 @ 12:00:24.086;controllers;level=info msg="finished reconciliation cycle for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.168;datastore;level=info bytes_in=1283 bytes_out=0 error="<nil>" latency=43.292773ms method=PUT remote_ip=10.34.9.28 service_account="system:serviceaccount:burrito-qpc-config-terraform:burrito-runner" status=200 uri="/api/plans?attempt=0&format=pretty&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.227;datastore;level=info bytes_in=360847 bytes_out=0 error="<nil>" latency=46.030994ms method=PUT remote_ip=10.34.9.28 service_account="system:serviceaccount:burrito-qpc-config-terraform:burrito-runner" status=200 uri="/api/plans?attempt=0&format=json&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.245;datastore;level=info bytes_in=43 bytes_out=0 error="<nil>" latency=16.921951ms method=PUT remote_ip=10.34.9.28 service_account="system:serviceaccount:burrito-qpc-config-terraform:burrito-runner" status=200 uri="/api/plans?attempt=0&format=short&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.261;datastore;level=info bytes_in=45342 bytes_out=0 error="<nil>" latency=14.946959ms method=PUT remote_ip=10.34.9.28 service_account="system:serviceaccount:burrito-qpc-config-terraform:burrito-runner" status=200 uri="/api/plans?attempt=0&format=bin&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.273;controllers;level=info msg="starting reconciliation for layer burrito-qpc-config-terraform/qpc-bastion ..."
Sep 24, 2024 @ 12:00:55.273;controllers;level=info msg="terraform layer qpc-bastion is locked, skipping reconciliation."
Sep 24, 2024 @ 12:01:23.012;controllers;level=info msg="starting reconciliation for layer burrito-qpc-config-terraform/qpc-bastion ..."
Sep 24, 2024 @ 12:01:23.012;controllers;level=info msg="terraform layer qpc-bastion is locked, skipping reconciliation."
Sep 24, 2024 @ 12:01:24.087;controllers;level=info msg="getting linked TerraformLayer to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:01:24.087;controllers;level=info msg="getting linked TerraformRepository to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:01:24.087;controllers;level=info msg="run qpc-bastion-plan-zsf8p has succeeded"
Sep 24, 2024 @ 12:01:24.087;controllers;level=info msg="starting reconciliation for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p ..."
Sep 24, 2024 @ 12:01:24.126;datastore;level=info bytes_in=25778 bytes_out=0 error="<nil>" latency=7.108905ms method=PUT remote_ip=10.34.12.119 service_account="system:serviceaccount:burrito-system:burrito-controllers" status=200 uri="/api/logs?attempt=0&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:01:24.140;controllers;level=info msg="finished reconciliation cycle for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p"
Hi! The could not get plan, there's an issue with the storage backend error logged by the controller during reconciliation means that the burrito-datastore pod encountered an error while retrieving the latest short plan on minio.
Can you look for logs in the burrito-datastore pod ?
To understand what file was attempted to be retrieved, you also can look for the following properties on the corresponding TerraformLayer CRD:
namespacename-Status.LastRun.Name
If a file exists at layers/<namespace>/<name>/<Status.LastRun.Name>/<any-attempt-number>/short.diff, then the error should not be caused by a missing file
Hi Lucas!
All the logs of every Burrito components are in the first comment.
The first call to datastore state this:
status=400 uri="/api/plans?attempt=&format=short&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run="
But I don't have any logs explaining the "issue with the storage backend".
It looks like the controller did not have a run to look for in the datastore. The run query parameter in the request should not be an emptystring but rather the name of the last run for the layer (e.g: <layer-name>-plan-xxxxx)
Can you look for Status.LastRun.Name property in your layer resources ? If it is empty, there might by an issue with the flow updating the LastRun properties
HI @LucasMrqes !
I finally have time to troubleshoot a lit more this problem. I also upgrade to 0.5.0 version.
When using datastore mock storage, I don't have any issue.
When I'm usin S3 storage, I got this message now:
controllers time="2024-10-29T10:07:43Z" level=debug msg="Failed to get last Result" logger=events object="{\"kind\":\"TerraformLayer\",\"namespace\":\"burrito-qpc-config-terraform\",\"name\":\"qpc-bastion\",\"uid\":\"421db6b5-883c-4877-8173-b62d84e4eb11\",\"apiVersion\":\"config.terraform.padok.cloud/v1alpha1\",\"resourceVersion\":\"1109090860\"}" reason=Reconciliation type=Normal
controllers time="2024-10-29T10:07:43Z" level=error msg="failed to get plan for layer qpc-bastion: no plan for this attempt
datastore time="2024-10-29T10:07:43Z" level=info bytes_in= bytes_out=24 error="<nil>" latency=32.280928ms method=GET remote_ip=10.34.12.207 service_account="system:serviceaccount:burrito-system:burrito-controllers" status=404 uri="/api/plans?attempt=&format=short&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-f68jh"
But all files are well pushed to minio bucket:
Here's the TerraformLayer:
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
metadata:
name: qpc-bastion
namespace: burrito-qpc-config-terraform
spec:
branch: master
path: bastion
repository:
name: qpc-config-terraform
namespace: burrito-qpc-config-terraform
uid: 421db6b5-883c-4877-8173-b62d84e4eb11
status:
[...]
lastResult: Error getting last Result
lastRun:
action: plan
date: '2024-10-29T05:04:53Z'
name: qpc-bastion-plan-f68jh
latestRuns:
- action: plan
date: '2024-10-29T05:04:53Z'
name: qpc-bastion-plan-f68jh
And the TerraformRun:
apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformRun
metadata:
creationTimestamp: '2024-10-29T05:04:53Z'
generateName: qpc-bastion-plan-
generation: 1
labels:
burrito/managed-by: qpc-bastion
name: qpc-bastion-plan-f68jh
namespace: burrito-qpc-config-terraform
ownerReferences:
- apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
name: qpc-bastion
uid: 421db6b5-883c-4877-8173-b62d84e4eb11
spec:
action: plan
artifact: {}
layer:
name: qpc-bastion
namespace: burrito-qpc-config-terraform
status:
attempts:
- logsUploaded: true
number: 0
podName: qpc-bastion-plan-6496p
conditions:
- lastTransitionTime: '2024-10-29T05:05:54Z'
message: This run has a status
observedGeneration: 1
reason: HasStatus
status: 'True'
type: HasStatus
- lastTransitionTime: '2024-10-29T05:05:54Z'
message: This run has not reached the retry limit (1)
observedGeneration: 1
reason: HasNotReachedRetryLimit
status: 'False'
type: HasReachedRetryLimit
- lastTransitionTime: '2024-10-29T05:05:54Z'
message: This run has succeeded
observedGeneration: 1
reason: HasSucceeded
status: 'True'
type: HasSucceeded
- lastTransitionTime: '2024-10-29T05:05:54Z'
message: This run is not running
observedGeneration: 1
reason: NotRunning
status: 'False'
type: IsRunning
- lastTransitionTime: '2024-10-29T05:05:54Z'
message: No failure has been detected yet
observedGeneration: 1
reason: NoFailureYet
status: 'False'
type: IsInFailureGracePeriod
lastRun: Tue Oct 29 05:04:53 UTC 2024
retries: 0
runnerPod: qpc-bastion-plan-6496p
state: Succeeded
So, I believe the problem is when attempt is not specified when calling the datastore.
Does using datastore.GetLatestPlan in https://github.com/padok-team/burrito/blob/main/internal/controllers/terraformlayer/controller.go#L120 will fix this problem?
I'll test this on my fork.
But I don't know why this problem doesn't appaers on mock storage...
Or maybe it's a regression when using UsePathStyle while listing objects.
Maybe need to modify this function https://github.com/padok-team/burrito/blob/main/internal/datastore/storage/s3/s3.go#L89
I don't think that using datastore.GetLatestPlan instead of using datastore.GetPlan would change anything. Since I am unable to reproduce the issue using s3 or mock storage, it might be specific to using minio (hence a regression with UsePathStyle). I will try to reproduce the issue with a Minio storage
OK, here's my findings:
When using AWS SDK for S3 ListObjectsV2, the Prefix must:
- not begins with a
/ - ends with a
/
But curently, the used prefix is something like /layers/<namespace>/<layer>/<run>.
So we need to transform it like layers/<namespace>/<layer>/<run>/ to get some list results.
I'll create a PR with these fixes.
I don't think it's related to using UsePathStyle. Do you know anyone who is using AWS S3 successfully?
I am using S3 successfully with version 53eae917625b4d2df47004e56143b2f78f2d6e61 (basically the same as you were mentioning in your first message), but the issue that you mention might only cause errors with minio. Can you confirm that removing the / works ?
Yes removing the leading / AND adding one at end works.
If the last / isn't provided, only the "prefix" is returned by S3 and not the child folders (so the different "attempts").
I'm successfully using Minio with S3 storage type and the PR that I wrote.
Sorry to re-open this issue...
We have another platform to install Burrito on: IBM Cloud. The block storage option is called COS and their API is S3 compatible.
But unfortunately I got different behavior when listing items :disappointed:
So I tested several options on both MinIO and IBM COS.
Different options:
- A:
prefixTest = prefix - B:
prefixTest = strings.TrimPrefix(prefix, "/") - C:
prefixTest = fmt.Sprintf("%s/", prefix) - D:
prefixTest = fmt.Sprintf("%s/", strings.TrimPrefix(prefix, "/"))
Code (inserted on https://github.com/padok-team/burrito/blob/main/internal/datastore/storage/s3/s3.go#L88) :
log.Warnf("List with prefix %s", prefixTest)
resultTest, err := a.Client.ListObjectsV2(context.TODO(), &storage.ListObjectsV2Input{
Bucket: &a.Config.Bucket,
Prefix: aws.String(prefixTest),
Delimiter: aws.String("/"),
})
if err == nil {
log.Warnf("CommonPrefixes size = %d", len(resultTest.CommonPrefixes))
for i, obj := range resultTest.CommonPrefixes {
log.Warnf("SOUDRY A - CommonPrefixes %d = %s", i, *obj.Prefix)
}
}
MinIO
A - List with prefix /layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr
A - CommonPrefixes size = 0
B - List with prefix layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr
B - CommonPrefixes size = 1
B - CommonPrefixes 0 = layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/
C - List with prefix /layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/
C - CommonPrefixes size = 0
D - List with prefix layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/
D - CommonPrefixes size = 2
D - CommonPrefixes 0 = layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/0/
D - CommonPrefixes 1 = layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/1/
IBM Cloud COS
A - List with prefix /layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6
A - CommonPrefixes size = 1
A - CommonPrefixes 0 = /layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6/
B - List with prefix layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6
B - CommonPrefixes size = 0
C - List with prefix /layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6/
C - CommonPrefixes size = 1
C - CommonPrefixes 0 = /layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6/0/
D - List with prefix layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6/
D - CommonPrefixes size = 0
So from what I understand from these results, MinIO cut off the leading slash on object keys. And I don't know how to deal with it without adding another boolean config to handle leading slash...
Another question I have is why Burrito create an object key with a leading slash?
In every use of buckets we have in my company (GCS, IBM COS, MinIO, OVH Object Storage), the key does not start with a /.
Any thoughts?
@LucasMrqes I created a PR to fix my use case on S3 (tested on MinIO and IBM COS). https://github.com/padok-team/burrito/pull/378
Can you have a look on it please?
And also give an opinion on whether or not to keep the leading slash on keys.
Hi Seb, I think you're right; object key prefixes typically shouldn't contain a leading slash. (Perhaps @Alan-pad could confirm if there’s was a specific reason for adding it?)
If there's no valid reason for keeping the leading slash and everything works well across common object storage systems, I’d suggest we remove it entirely.
OK tell me, I'll update the MR without the feature flag. Or changing the default value?
I suggest we keep it simple and merge this PR that completely removes leading slashes: https://github.com/padok-team/burrito/pull/379 We’ll add a warning in the next release's changelog to copy data from ./ to .. before upgrading to not lose it.
I'm OK to keep it simple!
I added a review to the MR, le trailing slash is needed for S3 listing.