burrito icon indicating copy to clipboard operation
burrito copied to clipboard

TerraformLayer "Error getting last Result"

Open seboudry opened this issue 1 year ago • 1 comments

Hi!

All my TerraformLayer objects have lastResult: Error getting last Result.

In web UI I can view logs but state is "OutOfSync" even if there is no changes in TF plan.

I'm using burrito image tag 10ebe5bfcd2ec9b899485865067dc871ded84ba7 on all burrito-system components.

Also, I use a MinIO datastore.

There's a line in log that says failed to get plan for layer qpc-bastion: could not get plan, there's an issue with the storage backend but it appears early in layer reconciliation...

When runner pod succeded I can see logs on MinIO storage: image

Did you get same behaviour recently?

Here's the logs:

Sep 24, 2024 @ 12:00:22.973;controllers;level=info msg="getting Linked TerraformRepository to layer qpc-bastion"
Sep 24, 2024 @ 12:00:22.973;controllers;level=info msg="layer qpc-bastion has an outdated plan, creating a new run"
Sep 24, 2024 @ 12:00:22.973;controllers;level=info msg="starting reconciliation for layer burrito-qpc-config-terraform/qpc-bastion ..."
Sep 24, 2024 @ 12:00:22.989;controllers;level=debug msg="Failed to get last Result" logger=events object="{\"kind\":\"TerraformLayer\",\"namespace\":\"burrito-qpc-config-terraform\",\"name\":\"qpc-bastion\",\"uid\":\"ae7ce0bc-b3ea-48aa-a894-b1dd903f15d6\",\"apiVersion\":\"config.terraform.padok.cloud/v1alpha1\",\"resourceVersion\":\"1056649421\"}" reason=Reconciliation type=Normal
Sep 24, 2024 @ 12:00:22.989;controllers;level=error msg="failed to get plan for layer qpc-bastion: could not get plan, there's an issue with the storage backend"
Sep 24, 2024 @ 12:00:22.989;datastore;level=info bytes_in= bytes_out=24 error="<nil>" latency=6.667784ms method=GET remote_ip=10.34.12.119 service_account="system:serviceaccount:burrito-system:burrito-controllers" status=400 uri="/api/plans?attempt=&format=short&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run="
Sep 24, 2024 @ 12:00:22.999;controllers;level=debug msg="Created TerraformRun for Plan action" logger=events object="{\"kind\":\"TerraformLayer\",\"namespace\":\"burrito-qpc-config-terraform\",\"name\":\"qpc-bastion\",\"uid\":\"ae7ce0bc-b3ea-48aa-a894-b1dd903f15d6\",\"apiVersion\":\"config.terraform.padok.cloud/v1alpha1\",\"resourceVersion\":\"1056649421\"}" reason=Reconciliation type=Normal
Sep 24, 2024 @ 12:00:22.999;controllers;level=info msg="getting linked TerraformLayer to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:22.999;controllers;level=info msg="getting linked TerraformRepository to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:22.999;controllers;level=info msg="run qpc-bastion-plan-zsf8p is in initial state"
Sep 24, 2024 @ 12:00:22.999;controllers;level=info msg="starting reconciliation for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p ..."
Sep 24, 2024 @ 12:00:23.012;controllers;level=info msg="finished reconciliation cycle for layer burrito-qpc-config-terraform/qpc-bastion"
Sep 24, 2024 @ 12:00:23.012;controllers;level=info msg="no runs to delete for layer qpc-bastion"
Sep 24, 2024 @ 12:00:23.027;controllers;level=info msg="CA certificate is available in namespace burrito-qpc-config-terraform"
Sep 24, 2024 @ 12:00:23.052;controllers;level=debug msg="Successfully created pod qpc-bastion-plan-742x5 for initial run" logger=events object="{\"kind\":\"TerraformRun\",\"namespace\":\"burrito-qpc-config-terraform\",\"name\":\"qpc-bastion-plan-zsf8p\",\"uid\":\"88d4c110-8167-4f9c-a467-af79ac708e94\",\"apiVersion\":\"config.terraform.padok.cloud/v1alpha1\",\"resourceVersion\":\"1056649425\"}" reason=Run type=Normal
Sep 24, 2024 @ 12:00:23.052;controllers;level=info msg="pod qpc-bastion-plan-742x5 not found, ignoring..."
Sep 24, 2024 @ 12:00:23.070;controllers;level=info msg="finished reconciliation cycle for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="getting linked TerraformLayer to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="getting linked TerraformRepository to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="pod qpc-bastion-plan-742x5 is not in a terminal state, ignoring..."
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="run qpc-bastion-plan-zsf8p is running"
Sep 24, 2024 @ 12:00:24.072;controllers;level=info msg="starting reconciliation for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p ..."
Sep 24, 2024 @ 12:00:24.086;controllers;level=info msg="finished reconciliation cycle for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.168;datastore;level=info bytes_in=1283 bytes_out=0 error="<nil>" latency=43.292773ms method=PUT remote_ip=10.34.9.28 service_account="system:serviceaccount:burrito-qpc-config-terraform:burrito-runner" status=200 uri="/api/plans?attempt=0&format=pretty&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.227;datastore;level=info bytes_in=360847 bytes_out=0 error="<nil>" latency=46.030994ms method=PUT remote_ip=10.34.9.28 service_account="system:serviceaccount:burrito-qpc-config-terraform:burrito-runner" status=200 uri="/api/plans?attempt=0&format=json&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.245;datastore;level=info bytes_in=43 bytes_out=0 error="<nil>" latency=16.921951ms method=PUT remote_ip=10.34.9.28 service_account="system:serviceaccount:burrito-qpc-config-terraform:burrito-runner" status=200 uri="/api/plans?attempt=0&format=short&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.261;datastore;level=info bytes_in=45342 bytes_out=0 error="<nil>" latency=14.946959ms method=PUT remote_ip=10.34.9.28 service_account="system:serviceaccount:burrito-qpc-config-terraform:burrito-runner" status=200 uri="/api/plans?attempt=0&format=bin&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:00:55.273;controllers;level=info msg="starting reconciliation for layer burrito-qpc-config-terraform/qpc-bastion ..."
Sep 24, 2024 @ 12:00:55.273;controllers;level=info msg="terraform layer qpc-bastion is locked, skipping reconciliation."
Sep 24, 2024 @ 12:01:23.012;controllers;level=info msg="starting reconciliation for layer burrito-qpc-config-terraform/qpc-bastion ..."
Sep 24, 2024 @ 12:01:23.012;controllers;level=info msg="terraform layer qpc-bastion is locked, skipping reconciliation."
Sep 24, 2024 @ 12:01:24.087;controllers;level=info msg="getting linked TerraformLayer to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:01:24.087;controllers;level=info msg="getting linked TerraformRepository to run qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:01:24.087;controllers;level=info msg="run qpc-bastion-plan-zsf8p has succeeded"
Sep 24, 2024 @ 12:01:24.087;controllers;level=info msg="starting reconciliation for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p ..."
Sep 24, 2024 @ 12:01:24.126;datastore;level=info bytes_in=25778 bytes_out=0 error="<nil>" latency=7.108905ms method=PUT remote_ip=10.34.12.119 service_account="system:serviceaccount:burrito-system:burrito-controllers" status=200 uri="/api/logs?attempt=0&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-zsf8p"
Sep 24, 2024 @ 12:01:24.140;controllers;level=info msg="finished reconciliation cycle for run burrito-qpc-config-terraform/qpc-bastion-plan-zsf8p"

seboudry avatar Sep 23 '24 14:09 seboudry

Hi! The could not get plan, there's an issue with the storage backend error logged by the controller during reconciliation means that the burrito-datastore pod encountered an error while retrieving the latest short plan on minio. Can you look for logs in the burrito-datastore pod ?

To understand what file was attempted to be retrieved, you also can look for the following properties on the corresponding TerraformLayer CRD:

  • namespace
  • name - Status.LastRun.Name

If a file exists at layers/<namespace>/<name>/<Status.LastRun.Name>/<any-attempt-number>/short.diff, then the error should not be caused by a missing file

LucasMrqes avatar Oct 04 '24 12:10 LucasMrqes

Hi Lucas!

All the logs of every Burrito components are in the first comment.

The first call to datastore state this:

status=400 uri="/api/plans?attempt=&format=short&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run="

But I don't have any logs explaining the "issue with the storage backend".

seboudry avatar Oct 07 '24 12:10 seboudry

It looks like the controller did not have a run to look for in the datastore. The run query parameter in the request should not be an emptystring but rather the name of the last run for the layer (e.g: <layer-name>-plan-xxxxx) Can you look for Status.LastRun.Name property in your layer resources ? If it is empty, there might by an issue with the flow updating the LastRun properties

LucasMrqes avatar Oct 08 '24 12:10 LucasMrqes

HI @LucasMrqes !

I finally have time to troubleshoot a lit more this problem. I also upgrade to 0.5.0 version.

When using datastore mock storage, I don't have any issue.

When I'm usin S3 storage, I got this message now:

controllers time="2024-10-29T10:07:43Z" level=debug msg="Failed to get last Result" logger=events object="{\"kind\":\"TerraformLayer\",\"namespace\":\"burrito-qpc-config-terraform\",\"name\":\"qpc-bastion\",\"uid\":\"421db6b5-883c-4877-8173-b62d84e4eb11\",\"apiVersion\":\"config.terraform.padok.cloud/v1alpha1\",\"resourceVersion\":\"1109090860\"}" reason=Reconciliation type=Normal
controllers time="2024-10-29T10:07:43Z" level=error msg="failed to get plan for layer qpc-bastion: no plan for this attempt
datastore time="2024-10-29T10:07:43Z" level=info bytes_in= bytes_out=24 error="<nil>" latency=32.280928ms method=GET remote_ip=10.34.12.207 service_account="system:serviceaccount:burrito-system:burrito-controllers" status=404 uri="/api/plans?attempt=&format=short&layer=qpc-bastion&namespace=burrito-qpc-config-terraform&run=qpc-bastion-plan-f68jh"

But all files are well pushed to minio bucket: image

Here's the TerraformLayer:

apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformLayer
metadata:
  name: qpc-bastion
  namespace: burrito-qpc-config-terraform
spec:
  branch: master
  path: bastion
  repository:
    name: qpc-config-terraform
    namespace: burrito-qpc-config-terraform
  uid: 421db6b5-883c-4877-8173-b62d84e4eb11
status:
   [...]
  lastResult: Error getting last Result
  lastRun:
    action: plan
    date: '2024-10-29T05:04:53Z'
    name: qpc-bastion-plan-f68jh
  latestRuns:
    - action: plan
      date: '2024-10-29T05:04:53Z'
      name: qpc-bastion-plan-f68jh

And the TerraformRun:

apiVersion: config.terraform.padok.cloud/v1alpha1
kind: TerraformRun
metadata:
  creationTimestamp: '2024-10-29T05:04:53Z'
  generateName: qpc-bastion-plan-
  generation: 1
  labels:
    burrito/managed-by: qpc-bastion
  name: qpc-bastion-plan-f68jh
  namespace: burrito-qpc-config-terraform
  ownerReferences:
    - apiVersion: config.terraform.padok.cloud/v1alpha1
      kind: TerraformLayer
      name: qpc-bastion
      uid: 421db6b5-883c-4877-8173-b62d84e4eb11
spec:
  action: plan
  artifact: {}
  layer:
    name: qpc-bastion
    namespace: burrito-qpc-config-terraform
status:
  attempts:
    - logsUploaded: true
      number: 0
      podName: qpc-bastion-plan-6496p
  conditions:
    - lastTransitionTime: '2024-10-29T05:05:54Z'
      message: This run has a status
      observedGeneration: 1
      reason: HasStatus
      status: 'True'
      type: HasStatus
    - lastTransitionTime: '2024-10-29T05:05:54Z'
      message: This run has not reached the retry limit (1)
      observedGeneration: 1
      reason: HasNotReachedRetryLimit
      status: 'False'
      type: HasReachedRetryLimit
    - lastTransitionTime: '2024-10-29T05:05:54Z'
      message: This run has succeeded
      observedGeneration: 1
      reason: HasSucceeded
      status: 'True'
      type: HasSucceeded
    - lastTransitionTime: '2024-10-29T05:05:54Z'
      message: This run is not running
      observedGeneration: 1
      reason: NotRunning
      status: 'False'
      type: IsRunning
    - lastTransitionTime: '2024-10-29T05:05:54Z'
      message: No failure has been detected yet
      observedGeneration: 1
      reason: NoFailureYet
      status: 'False'
      type: IsInFailureGracePeriod
  lastRun: Tue Oct 29 05:04:53 UTC 2024
  retries: 0
  runnerPod: qpc-bastion-plan-6496p
  state: Succeeded

So, I believe the problem is when attempt is not specified when calling the datastore. Does using datastore.GetLatestPlan in https://github.com/padok-team/burrito/blob/main/internal/controllers/terraformlayer/controller.go#L120 will fix this problem? I'll test this on my fork.

But I don't know why this problem doesn't appaers on mock storage...

seboudry avatar Oct 29 '24 10:10 seboudry

Or maybe it's a regression when using UsePathStyle while listing objects. Maybe need to modify this function https://github.com/padok-team/burrito/blob/main/internal/datastore/storage/s3/s3.go#L89

seboudry avatar Oct 29 '24 12:10 seboudry

I don't think that using datastore.GetLatestPlan instead of using datastore.GetPlan would change anything. Since I am unable to reproduce the issue using s3 or mock storage, it might be specific to using minio (hence a regression with UsePathStyle). I will try to reproduce the issue with a Minio storage

LucasMrqes avatar Oct 29 '24 14:10 LucasMrqes

OK, here's my findings:

When using AWS SDK for S3 ListObjectsV2, the Prefix must:

  • not begins with a /
  • ends with a /

But curently, the used prefix is something like /layers/<namespace>/<layer>/<run>. So we need to transform it like layers/<namespace>/<layer>/<run>/ to get some list results.

I'll create a PR with these fixes.

I don't think it's related to using UsePathStyle. Do you know anyone who is using AWS S3 successfully?

seboudry avatar Oct 29 '24 15:10 seboudry

I am using S3 successfully with version 53eae917625b4d2df47004e56143b2f78f2d6e61 (basically the same as you were mentioning in your first message), but the issue that you mention might only cause errors with minio. Can you confirm that removing the / works ?

LucasMrqes avatar Oct 29 '24 17:10 LucasMrqes

Yes removing the leading / AND adding one at end works. If the last / isn't provided, only the "prefix" is returned by S3 and not the child folders (so the different "attempts"). I'm successfully using Minio with S3 storage type and the PR that I wrote.

seboudry avatar Oct 29 '24 18:10 seboudry

Sorry to re-open this issue...

We have another platform to install Burrito on: IBM Cloud. The block storage option is called COS and their API is S3 compatible.

But unfortunately I got different behavior when listing items :disappointed:

So I tested several options on both MinIO and IBM COS.

Different options:

  • A: prefixTest = prefix
  • B: prefixTest = strings.TrimPrefix(prefix, "/")
  • C: prefixTest = fmt.Sprintf("%s/", prefix)
  • D: prefixTest = fmt.Sprintf("%s/", strings.TrimPrefix(prefix, "/"))

Code (inserted on https://github.com/padok-team/burrito/blob/main/internal/datastore/storage/s3/s3.go#L88) :

log.Warnf("List with prefix %s", prefixTest)
resultTest, err := a.Client.ListObjectsV2(context.TODO(), &storage.ListObjectsV2Input{
  Bucket:    &a.Config.Bucket,
  Prefix:    aws.String(prefixTest),
  Delimiter: aws.String("/"),
})
if err == nil {
  log.Warnf("CommonPrefixes size = %d", len(resultTest.CommonPrefixes))
  for i, obj := range resultTest.CommonPrefixes {
    log.Warnf("SOUDRY A - CommonPrefixes %d = %s", i, *obj.Prefix)
  }
}

MinIO

A - List with prefix /layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr
A - CommonPrefixes size = 0

B - List with prefix layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr
B - CommonPrefixes size = 1
B - CommonPrefixes 0 = layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/

C - List with prefix /layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/
C - CommonPrefixes size = 0

D - List with prefix layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/
D - CommonPrefixes size = 2
D - CommonPrefixes 0 = layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/0/
D - CommonPrefixes 1 = layers/burrito-qpc-config-terraform/qpc-bastion/qpc-bastion-plan-kh8wr/1/

IBM Cloud COS

A - List with prefix /layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6
A - CommonPrefixes size = 1
A - CommonPrefixes 0 = /layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6/

B - List with prefix layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6
B - CommonPrefixes size = 0

C - List with prefix /layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6/
C - CommonPrefixes size = 1
C - CommonPrefixes 0 = /layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6/0/

D - List with prefix layers/burrito-qpc-config-terraform/qpc-teleport/qpc-teleport-plan-zcpt6/
D - CommonPrefixes size = 0

So from what I understand from these results, MinIO cut off the leading slash on object keys. And I don't know how to deal with it without adding another boolean config to handle leading slash...

Another question I have is why Burrito create an object key with a leading slash? In every use of buckets we have in my company (GCS, IBM COS, MinIO, OVH Object Storage), the key does not start with a /.

Any thoughts?

seboudry avatar Oct 31 '24 16:10 seboudry

@LucasMrqes I created a PR to fix my use case on S3 (tested on MinIO and IBM COS). https://github.com/padok-team/burrito/pull/378

Can you have a look on it please?

And also give an opinion on whether or not to keep the leading slash on keys.

seboudry avatar Nov 06 '24 16:11 seboudry

Hi Seb, I think you're right; object key prefixes typically shouldn't contain a leading slash. (Perhaps @Alan-pad could confirm if there’s was a specific reason for adding it?)

If there's no valid reason for keeping the leading slash and everything works well across common object storage systems, I’d suggest we remove it entirely.

LucasMrqes avatar Nov 06 '24 18:11 LucasMrqes

OK tell me, I'll update the MR without the feature flag. Or changing the default value?

seboudry avatar Nov 08 '24 14:11 seboudry

I suggest we keep it simple and merge this PR that completely removes leading slashes: https://github.com/padok-team/burrito/pull/379 We’ll add a warning in the next release's changelog to copy data from ./ to .. before upgrading to not lose it.

LucasMrqes avatar Nov 08 '24 16:11 LucasMrqes

I'm OK to keep it simple!

I added a review to the MR, le trailing slash is needed for S3 listing.

seboudry avatar Nov 12 '24 15:11 seboudry