terraform-provider-google icon indicating copy to clipboard operation
terraform-provider-google copied to clipboard

Terraform not honouring OS IPv4 settings, using IPv6 dst to call *.googleapis.com

Open mhanline opened this issue 4 years ago • 19 comments

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

tf version
Terraform v0.12.28
+ provider.google v3.29.0
+ provider.google-beta v3.29.0

Affected Resource(s)

All resources, not specific to any one.

Terraform Configuration Files

While this happens intermittently and it's not specific to this config, it seems to happen with longer Terraform runs. You may need to apply / destroy 1-2 times before seeing this issue.

gist link to config

Debug Output

I see this output sporadically, and not on the same API call. Note the DST IP is an IPv6 address, but Cloud Shell does not enable IPv6 in the OS: Link to gist

Console output when issue occurs (Note the IPv6 address is being used):

Error: Error when reading or editing Project Service [project-id]/trafficdirector.googleapis.com: Get "https://cloudresourcemanager.googleapis.com/v1/projects/[project-id]?alt=json&prettyPrint=false": dial tcp [2404:6800:4003:c00::5f]:443: connect: cannot assign requested address
Error: Error retrieving available container cluster versions: Get "https://container.googleapis.com/v1beta1/projects/[project-id]/locations/asia-east1-c/serverConfig?alt=json&prettyPrint=false": dial tcp [2404:6800:4003:c04::5f]:443: connect: cannot assign requested address
Error: Error when reading or editing Project Service [project-id]/trafficdirector.googleapis.com: Get "https://cloudresourcemanager.googleapis.com/v1/projects/[project-id]?alt=json&prettyPrint=false": dial tcp [2404:6800:4003:c03::5f]:443: connect: cannot assign requested address

Expected Behavior

Terraform / Google provider should respect the OS network settings and use IPv4 addresses to call out to *.googleapis.com.

Actual Behavior

tf apply / tf destroy does not always successfully complete, and will return the errors above.

Steps to Reproduce

  1. Open Google Cloud Shell (no IPv6 stack)
  2. Run tf apply or tf destroy on the linked config
  3. Most times it will succeed, but about every second attempt it report the above errors

Note, if I statically configure /etc/hosts to resolve to a specific IPv4 address - say 199.36.153.8, the above errors never occur.

Important Factoids

Authenticating using application default credentials, built into Cloud Shell.

Confirm IPv6 is not enabled on the OS:

myusername@cloudshell:~$ sudo sysctl -n net.ipv6.conf.all.disable_ipv6 && sysctl -n net.ipv6.conf.default.disable_ipv6
1
1

References

Similar issue 1 (with Go) Similar issue 2 Workaround solution

  • b/160321706

mhanline avatar Jul 12 '20 04:07 mhanline

Here's what I know so far:

Based on https://github.com/golang/go/issues/25321 and https://github.com/hashicorp/terraform-provider-vsphere/issues/636, something that could fix it would be to compile with CGO enabled. The build script that I assume our release pipeline uses explicitly disables CGO. This was introduced in https://github.com/hashicorp/terraform/pull/7107 because it ensures the compiled binaries are statically linked (https://github.com/hashicorp/terraform/issues/6714). If I'm reading https://blog.madewithdrew.com/post/statically-linking-c-to-go/ right, then there should be a way to resolve this without having to explicitly disable CGO. It's also possible that things are different now than they were 4 years ago when the previous issues were brought up.

@megan07, is that indeed the build script that's used for the providers? If you don't mind, could you ask around to see if anyone at HashiCorp has any ideas on this? In the meantime, marking it upstream since I think it'll be good to have open as a reference for people that run into this, but I don't expect there being much we can do on the provider end.

danawillow avatar Jul 17 '20 22:07 danawillow

I'm running into similar issue intermittently as well in GCP cloud shell.

$ ~/bin/terraform --version
Terraform v0.13.5
+ provider registry.terraform.io/hashicorp/google v3.49.0
+ provider registry.terraform.io/hashicorp/google-beta v3.49.0
+ provider registry.terraform.io/hashicorp/http v2.0.0
+ provider registry.terraform.io/hashicorp/null v3.0.0
+ provider registry.terraform.io/hashicorp/random v3.0.0
+ provider registry.terraform.io/hashicorp/template v2.2.0
+ provider registry.terraform.io/hashicorp/time v0.6.0

In my particular case, my script uploads a file to a Windows Server. I first get this error:

module.dc.null_resource.upload-scripts: Still creating... [4m50s elapsed]
Error: timeout - last error: unknown error Post "https://35.236.28.181:5986/wsman": dial tcp 35.236.28.181:5986: i/o timeout

I checked that the firewalls are opened to the IP of the cloudshell instance. I try to do terraform apply again, and I would run into these "cannot assign requested address" errors while refreshing state. e.g. my first run I get:

module.cac.module.cac-regional[0].random_shuffle.zone: Refreshing state... [id=-]

Error: Error when reading or editing ComputeNetwork "projects/[project-id]/global/networks/vpc-cas": Get "https://compute.googleapis.com/compute/v1/projects/[project-id]/global/networks/vpc-cas?alt=json": dial tcp [2607:f8b0:400e:c09::5f]:443: connect: cannot assign requested address

Then immediately I run terraform apply again, and it would fail in a different place.

google_compute_router_nat.nat: Refreshing state... [id=[project-id]/us-west2/router/nat]

Error: Error when reading or editing Storage Bucket "pcoip-scripts-7d731c": Get "https://storage.googleapis.com/storage/v1/b/pcoip-scripts-7d731c?alt=json&prettyPrint=false": dial tcp [2607:f8b0:400e:c07::80]:443: connect: cannot assign requested address

Finally, 3rd time it would let me type "yes" to apply the changes, but it will fail again timing out trying to upload the files. We run this same script a few times a week but most of the time there are no issues.

$ sysctl  net.ipv6.conf.all.disable_ipv6
net.ipv6.conf.all.disable_ipv6 = 1
$ sysctl  net.ipv6.conf.default.disable_ipv6
net.ipv6.conf.default.disable_ipv6 = 1

shermanyin avatar Jan 08 '21 23:01 shermanyin

@c2thorn @rileykarson some of the team members have been running into this lately. Any possibilities for a fix?

/cc @daniel-cit

bharathkkb avatar Apr 29 '21 22:04 bharathkkb

same here, could you give steps to resolve ?

ocervell avatar May 06 '21 09:05 ocervell

Hi there :)

We experienced this as well in a relatively long terraform apply (5-6 mins), running from Cloud Shell. Thanks for your support!

Error: Error creating service account: Post "https://iam.googleapis.com/v1/projects/[REDACTED_PROJECT_ID]/serviceAccounts?alt=json&prettyPrint=false": dial tcp [REDACTED_IP_V6_ADDRESS]:443: connect: cannot assign requested address

/cc @jbrook

ferrarimarco avatar Jul 05 '21 11:07 ferrarimarco

Quick and dirt plug:

# Workaround https://github.com/hashicorp/terraform-provider-google/issues/6782
    sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 net.ipv6.conf.lo.disable_ipv6=1 > /dev/null
    export APIS="googleapis.com www.googleapis.com storage.googleapis.com iam.googleapis.com container.googleapis.com cloudresourcemanager.googleapis.com"
    for name in $APIS
    do
      ipv4=$(getent ahostsv4 "$name" | head -n 1 | awk '{ print $1 }')
      grep -q "$name" /etc/hosts || ([ -n "$ipv4" ] && sudo sh -c "echo '$ipv4 $name' >> /etc/hosts")
    done
# Workaround end

isimluk avatar Jul 06 '21 08:07 isimluk

get all gcp api endpoints

gcloud services list --available --filter="name:googleapis.com" --format "csv[no-heading](ID)" --format "value(NAME)"

lirlia avatar Sep 14 '22 07:09 lirlia

I can't believe this is still an issue 2+ years after the bug was opened.

rhyas avatar Oct 12 '22 19:10 rhyas

I have exactly the same issue when executing from a mac

Jubblin avatar Nov 04 '22 11:11 Jubblin

+1

sean9999 avatar Nov 09 '22 20:11 sean9999

any update on this ?

rubber-ant avatar Mar 23 '23 23:03 rubber-ant

2023 : this is still an issue. I also encounter this problem.

jlenuffgsoi avatar May 26 '23 07:05 jlenuffgsoi

Another confirmation that this is still occurring. Very painful.

liamstevens avatar Sep 04 '23 06:09 liamstevens

Error: Error retrieving available secret manager secret versions: Get "https://secretmanager.googleapis.com/v1/projects/<PROJECT>/secrets/<SECRET_NAME>/versions/latest?alt=json": Post "https://oauth2.googleapis.com/token": dial tcp [2607:f8b0:400f:807::200a]:443: connect: no route to host

While on a different than the OP, I recently encountered a similar issue ^ on my macOS system. For me, the resolution was changing the network WI-FI settings for DNS (from my ISP's router to a public DNS (like 1.1.1.1)). After this change, terraform plan and terraform apply were successful. Maybe this will help other users too.

kevin-dimichel avatar Sep 26 '23 22:09 kevin-dimichel

The suggested work around by @kevin-dimichel ( change DNS to 1.0.0.1 and 1.1.1.1 ) fix this for me.

rpjeff avatar Nov 14 '23 22:11 rpjeff

Can confirm this with Google CloudShell.

pspot2 avatar Apr 15 '24 15:04 pspot2

I've been looking into this and it looks like it should be possible for us to resolve on the provider side. We should be able to use nettest.SupportsIPv6 to detect whether the current environment supports IPv6 and then force the transport layer to use IPv4 if not. Something like adding the following after this line:

client.Transport = headerTransport

client.Transport.DialContext = func(ctx context.Context, network string, addr string) (net.Conn, error) {
	d := &net.Dialer{}
	if !nettest.SupportsIPv6() {
		return d.DialContext(ctx, "tcp4", addr)
	}
	return d.DialContext(ctx, network, addr)
}

However, I can't actually reproduce this bug on cloud shell, so I can't tell if the fix actually works. If anyone has a configuration that consistently and quickly causes this error in cloud shell, that would be extremely helpful!

EDIT: apparently the override isn't quite that simple, continuing to dig, but still - reproducible cases would be great. Alternative fix would be to force setting the GODEBUG=netdns=cgo when initializing the config, but that is definitely hackier than I would prefer (and may also not work.)

yaqs/47302089738551296

melinath avatar Apr 16 '24 17:04 melinath

I am facing similar issue with api.cloudflare.com

der-ali avatar May 23 '24 15:05 der-ali

I had similar issue and resolution to kevin-dimichel (above)

% terraform init

Initializing the backend...

Initializing provider plugins...
- Finding hashicorp/aws versions matching "~> 5.0"...
╷
│ Error: Failed to query available provider packages
│ 
│ Could not retrieve the list of available versions for provider hashicorp/aws: could not
│ query provider registry for registry.terraform.io/hashicorp/aws: the request failed after
│ 2 attempts, please try again later: Get
│ "https://registry.terraform.io/v1/providers/hashicorp/aws/versions": dial tcp
│ [2600:9000:2212:ee00:16:1aa3:1440:93a1]:443: connect: network is unreachable
╵

Version Info:

  • Terraform: 1.8.5
  • Debian 12.5
  • Linux 6.1.0-21-amd64

I resolved this by overriding the DNS servers for both IPv4/6 with the Quad9 servers.

nhairs avatar Jun 28 '24 03:06 nhairs