terraform-provider-flux icon indicating copy to clipboard operation
terraform-provider-flux copied to clipboard

Timeout error when creating `flux_bootstrap_git`

Open oliviermichaelis opened this issue 1 year ago • 4 comments

Creating flux_bootstrap_git fails for me without a helpful error message. The gotk-components.yaml file is created as expected in my repo, but the bootstrap process doesn't proceed further. No resources were created in the kubernetes cluster by the provider.

This is the provider configuration I'm using:

provider "flux" {
  kubernetes = {
    config_path = "redacted"
  }
  git = {
    url = "ssh://[email protected]/redacted/redacted.git"
    ssh = {
      private_key = tls_private_key.flux_deploy_key.private_key_pem
      username    = "git"
    }
  }
}

and my flux_bootstrap_git resource:

resource "flux_bootstrap_git" "this" {
  depends_on = [github_repository_deploy_key.infrastructure_flux]
  path       = "kubernetes"
}

Running TF_LOG=DEBUG terraform apply results in:

[...]
2023-07-22T00:43:46.582+0200 [INFO]  Starting apply for flux_bootstrap_git.this
2023-07-22T00:43:46.582+0200 [DEBUG] skipping FixUpBlockAttrs
2023-07-22T00:43:46.582+0200 [DEBUG] flux_bootstrap_git.this: applying the planned Create change
2023-07-22T00:43:46.583+0200 [DEBUG] provider.terraform-provider-flux_v1.0.1: Calling provider defined Resource Configure: tf_req_id=092e427d-b6cc-dd78-31de-b79b7dda5d96 tf_resource_type=flux_bootstrap_git @caller=github.com/hashicorp/[email protected]/internal/fwserver/server_createresource.go:54 @module=sdk.framework tf_provider_addr=registry.terraform.io/fluxcd/flux tf_rpc=ApplyResourceChange timestamp=2023-07-22T00:43:46.583+0200
2023-07-22T00:43:46.583+0200 [DEBUG] provider.terraform-provider-flux_v1.0.1: Called provider defined Resource Configure: @module=sdk.framework tf_req_id=092e427d-b6cc-dd78-31de-b79b7dda5d96 tf_resource_type=flux_bootstrap_git @caller=github.com/hashicorp/[email protected]/internal/fwserver/server_createresource.go:56 tf_provider_addr=registry.terraform.io/fluxcd/flux tf_rpc=ApplyResourceChange timestamp=2023-07-22T00:43:46.583+0200
2023-07-22T00:43:46.583+0200 [DEBUG] provider.terraform-provider-flux_v1.0.1: Calling provider defined Resource Create: tf_req_id=092e427d-b6cc-dd78-31de-b79b7dda5d96 @caller=github.com/hashicorp/[email protected]/internal/fwserver/server_createresource.go:100 @module=sdk.framework tf_provider_addr=registry.terraform.io/fluxcd/flux tf_resource_type=flux_bootstrap_git tf_rpc=ApplyResourceChange timestamp=2023-07-22T00:43:46.583+0200
flux_bootstrap_git.this: Still creating... [10s elapsed]
flux_bootstrap_git.this: Still creating... [20s elapsed]
flux_bootstrap_git.this: Still creating... [30s elapsed]
flux_bootstrap_git.this: Still creating... [40s elapsed]
flux_bootstrap_git.this: Still creating... [50s elapsed]
flux_bootstrap_git.this: Still creating... [1m0s elapsed]
2023-07-22T00:44:50.102+0200 [DEBUG] provider.terraform-provider-flux_v1.0.1: Called provider defined Resource Create: tf_provider_addr=registry.terraform.io/fluxcd/flux tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/[email protected]/internal/fwserver/server_createresource.go:102 @module=sdk.framework tf_req_id=092e427d-b6cc-dd78-31de-b79b7dda5d96 tf_resource_type=flux_bootstrap_git timestamp=2023-07-22T00:44:50.102+0200
2023-07-22T00:44:50.102+0200 [ERROR] provider.terraform-provider-flux_v1.0.1: Response contains error diagnostic: tf_rpc=ApplyResourceChange diagnostic_summary="Bootstrap run error" tf_proto_version=6.3 tf_provider_addr=registry.terraform.io/fluxcd/flux @module=sdk.proto diagnostic_detail="timeout waiting for: []" diagnostic_severity=ERROR tf_req_id=092e427d-b6cc-dd78-31de-b79b7dda5d96 @caller=github.com/hashicorp/[email protected]/tfprotov6/internal/diag/diagnostics.go:58 tf_resource_type=flux_bootstrap_git timestamp=2023-07-22T00:44:50.102+0200
2023-07-22T00:44:50.104+0200 [DEBUG] State storage *remote.State declined to persist a state snapshot
2023-07-22T00:44:50.104+0200 [ERROR] vertex "flux_bootstrap_git.this" error: Bootstrap run error
2023-07-22T00:44:50.104+0200 [DEBUG] states/remote: state read serial is: 196; serial is: 196
2023-07-22T00:44:50.104+0200 [DEBUG] states/remote: state read lineage is: 8783a2a8-f702-5e76-66ee-8cb7e913c948; lineage is: 8783a2a8-f702-5e76-66ee-8cb7e913c948
╷
│ Error: Bootstrap run error
│ 
│   with flux_bootstrap_git.this,
│   on flux.tf line 68, in resource "flux_bootstrap_git" "this":
│   68: resource "flux_bootstrap_git" "this" {
│ 
│ timeout waiting for: []
╵
Releasing state lock. This may take a few moments...
2023-07-22T00:44:50.711+0200 [DEBUG] provider.stdio: received EOF, stopping recv loop: err="rpc error: code = Unavailable desc = error reading from server: EOF"
2023-07-22T00:44:50.722+0200 [DEBUG] provider: plugin process exited: path=.terraform/providers/registry.terraform.io/fluxcd/flux/1.0.1/linux_amd64/terraform-provider-flux_v1.0.1 pid=45352
2023-07-22T00:44:50.722+0200 [DEBUG] provider: plugin exited

Any pointers how I could debug this further? Thanks! :)

oliviermichaelis avatar Jul 21 '23 22:07 oliviermichaelis

I'm having the same issue. Not sure how to proceed.

Example commit history of failed runs here

flux_bootstrap_git resource here.

I've tried with all the flux 1.0.x and RC versions and gotten errors. Previously (several months ago?) this worked with provider version 1.0.0-rc.1 but no longer does. I don't know what in my environment configuration has changed between then and now.

Let me know if there's any more information I can provide.

Error output:


module.bootstrap.flux_bootstrap_git.this: Still creating... [30s elapsed]
╷
│ Error: Bootstrap run error
│ 
│   with module.bootstrap.flux_bootstrap_git.this,
│   on ../../.modules/bootstrap-cluster/main.tf line 34, in resource "flux_bootstrap_git" "this":
│   34: resource "flux_bootstrap_git" "this" {
│ 
│ failed to push sync manifests: ssh: handshake failed: read tcp
│ 192.168.1.13:64677->140.82.114.4:22: read: operation timed out
╵

ionfury avatar Jul 23 '23 02:07 ionfury

Interestingly, I did get the same error when trying to bootstrap with the CLI:

$ flux bootstrap github --owner=redacted --repository=redacted --private=true --personal=true --path=kubernetes --branch=master
► connecting to github.com
► cloning branch "master" from Git repository "https://github.com/redacted/redacted.git"
✔ cloned repository
► generating component manifests
✔ generated component manifests
✔ committed sync manifests to "master" ("123")
► pushing component manifests to "https://github.com/redacted/redacted.git"
► installing components in "flux-system" namespace
✗ timeout waiting for: []

oliviermichaelis avatar Jul 23 '23 08:07 oliviermichaelis

Glancing through the code here it looks like it just uses the flux cli. Maybe the issue would get more traction in that repo?

ionfury avatar Jul 23 '23 15:07 ionfury

I'm seeing the same problem using Tofu/Terraform. I get a lot of transient errors running flux_bootstrap_git, actually. They vary sometimes and happen randomly. Usually running apply again works. I wonder if this may be a Github issue? Maybe there's some sort of issue with using the key created with github_repository_deploy_key so quickly after creating it? (Some sort of propagation issue on their end?). Perhaps running an Maybe an internal retry mechanism would be enough.

joekrill avatar Dec 05 '23 01:12 joekrill

Hello @oliviermichaelis 👋

I hope all is well with you. I'm just contacting you as part of our initiative to ensure the quality and responsiveness of our project's issue tracking. Thank you for your previous contributions by reporting issues you've encountered; your engagement is invaluable to the health and progress of our project.

I've reviewed the issue you reported and noted that it involves using supported resources, which is great. However, I also noticed that there has been little activity on this issue lately. We understand that circumstances change and priorities shift, which may affect the urgency or relevance of an issue.

We're implementing a policy regarding inactive issues to keep our project's issue tracker effective and up-to-date. If there's no further activity on this issue within the next 3 weeks, we will mark it inactive and close it. This approach helps us maintain focus on current and actively pursued concerns, ensuring resources are well allocated.

That said, we are still very much interested in resolving the issue you've encountered. If the problem persists or if you have any new information that could help us address it, please don't hesitate to update the issue or comment. Your input could be crucial in finding a resolution. Alternatively, if the issue no longer affects you or has been resolved, let us know so we can close it with the right resolution.

We're here to support and work with you to overcome any challenges you face using our project. Your continued feedback and participation are essential to our community's success.

Thank you once again for your contribution. I look forward to your update and am here to assist with any further questions or issues you might have.

Best regards,

Steve

swade1987 avatar Mar 29 '24 12:03 swade1987

I've reviewed the issue you reported and noted that it involves using supported resources, which is great. However, I also noticed that there has been little activity on this issue lately.

So it's great that you reviewed it, so can you give us some sort of a solution?

That said, we are still very much interested in resolving the issue you've encountered. If the problem persists or if you have any new information that could help us address it, please don't hesitate to update the issue or comment.

Some of us, poor folks, still have it :(

We're here to support and work with you to overcome any challenges you face using our project. Your continued feedback and participation are essential to our community's success.

Please support us, it's indeed challenging. I have the same problem as theirs, following the official documentation. H e l p.

3835085601 avatar Apr 09 '24 12:04 3835085601

Please join the conversation in #flux-ecosystem on the CNCF slack. It's not exactly clear to me what's wrong, but it sounds like the provider is having difficulty reaching something, but timeout waiting for: [] is really non-specific and will be hard to follow-up without hands on a reproduction of the issue.

We're going to need to have some synchronous interaction with somebody that has a live one of these, in order to help them personally. Once we have seen a handful of you or less, we can definitely address the circumstances that would lead to this context-less error arising from the provider module. I think it should be possible to guarantee the provider never exits without any message for reason, but having not myself reviewed the provider codebase in detail so far, I can't really point the finger and say "there, that's the problem."

kingdonb avatar Apr 09 '24 13:04 kingdonb

@swade1987: Alternatively, if the issue no longer affects you or has been resolved, let us know so we can close it with the right resolution.

I stopped using the provider because of it, so I cannot provide feedback whether the issue would be resolved for me. Maybe some of the other affected people can help with the reproduction? :)

oliviermichaelis avatar Apr 09 '24 14:04 oliviermichaelis

@oliviermichaelis so you stopped using the provider due to difficulties migration from the existing resources to flux_bootstrap_git? What are you using instead out of interest?

swade1987 avatar Apr 09 '24 14:04 swade1987

I don't remember exactly what I did, but I don't think it was a migration. I was just looking for the most fluxcd-native experience I could get for my scenario.

Right now, I'm using the kubectl provider to bootstrap fluxcd from YAML manifests. And renovate to update the YAML manifests that are used to bootstrap fluxcd.

oliviermichaelis avatar Apr 09 '24 14:04 oliviermichaelis

This may or may not be useful for you but the way I have managed flux in previous organisations can be found at https://github.com/swade1987/flux2-multi-cluster-bootstrap-repo/tree/main. Finally, are you ok for me to close this issue?

swade1987 avatar Apr 09 '24 15:04 swade1987

Yes, since I cannot provide feedback I'm okay with closing this issue. I think other people can open new issues and provide debug information if they still experience problems

oliviermichaelis avatar Apr 09 '24 15:04 oliviermichaelis