terraform icon indicating copy to clipboard operation
terraform copied to clipboard

Proposal: port forwarding via SSH tunnel

Open nicolai86 opened this issue 8 years ago • 67 comments

proposal: port forwarding via SSH tunnel

I'd like to start adding port forwarding via SSH tunnels to terraform.

This is useful when you want to use terraform with systems which are only accessible via a jump host, ie. company internal systems.

Right now terraform already ships with a bunch of providers which might need to talk to internal systems (e.g. postgres/ mysql/ influxdb/…).

The status quo is to create a SSH tunnel beforehand, or, in cases where the entire infrastructure is created from scratch, to be split terraform scripts into multiple stages with glue code outside. E.g. one might setup a private cluster with a jump host, open an SSH tunnel via bash, and then run a differen terraform script using the newly created tunnel to access private systems, all wrapped in a single setup.sh script.

Assuming that the SSH tunnel is required for all resources of a given provider, I suggest adding connection settings to the terraform providers as well, like this:

provider "consul" {
    address = "localhost:80"
    datacenter = "nyc1"

    # run "ssh -L localhost:80:demo.consul.io:80" for any resources of this provider
    connection {
        user = "private-user"
        host = "private.jump-host.io"

        forward {
            remote_host = "demo.consul.io"
            remote_port = 80
            local_port = 80
        }
    }
}

# Access a key in Consul; consul is only available via SSH tunnel
resource "consul_keys" "app" {
    key {
        name = "ami"
        path = "service/app/launch_ami"
        default = "ami-1234"
    }
}

Looking forward to any feedback before I head of adding something like this to terraform… ;)

Related: #4442, #4775

nicolai86 avatar Aug 22 '16 07:08 nicolai86

Hi @nicolai86! I'm certainly not opposed to this, though I'm not sure exactly what it would look like. Going to cc @phinze or @mitchellh here for a second opinion on this.

jen20 avatar Aug 22 '16 12:08 jen20

This sounds reasonable to me. My only comment so far would be to name it something like local_forward to align it with the actual type of forwarding being done (-L), and leave room in case we find a need for remote_forward (-R) later on.

jbardin avatar Aug 22 '16 16:08 jbardin

This is an interesting approach. I have some feedback, but really just exploring the idea:


Given that the connection doesn't really "belong to" the provider, I wonder if we should hoist it out to the top level, and add some interpolation variables for it like this:

provider "consul" {
    # expands to the local listen address and port that the "connection" created
    address = "${connection.consul_tunnel.local_address}"
    datacenter = "nyc1"
}

connection "consul_tunnel" {
    type = "ssh"
    user = "private-user"
    host = "private.jump-host.io"

    forward {
        remote_host = "demo.consul.io"
        remote_port = 80
        local_port = 80
    }
}

Presumably for real use the user would sometimes need to provide some credentials in the connection block (either a private key or a password), so the ability to interpolate from variables would be useful to avoid hard-coding those credentials in the config.


It could also be nice to make the local port optional and have Terraform just allocate any arbitrary open port and expose it via the interpolation variable, so the user doesn't need to think about what port is likely to be open on all machines where Terraform might be run.


Wondering if maybe it would be more intuitive to invert the nesting, so that the forwarder is the primary object and it takes a connection as part of its configuration, similar to how resources and provisioners work:

port_forward "consul_tunnel" {
    remote_host = "demo.consul.io"
    remote_port = 80

    connection {
        # Now the connection block is the same as in other contexts, as long
        # as the selected connection type supports port forwarding.
        type = "ssh"
        user = "private-user"
        host = "private.jump-host.io"
    }
}

provider "consul" {
    address = "${port_forward.consul_tunnel.local_address}"
}

apparentlymart avatar Aug 22 '16 19:08 apparentlymart

I think exposing the port forwarding as a primitive is a good idea in terms of reuse between multiple resources, and it might also help with code reuse given that the connection attribute already exists on resources. I'm also hoping for a clean integration into the execution graph.

It seems that the general theme is "this is a worthwhile addition" and the questions are mostly minor details. Since I have no idea at all about the terraform core internals I'll take a deep dive and report back in a couple of days…

nicolai86 avatar Aug 22 '16 21:08 nicolai86

@nicolai86 I would suggest giving @phinze and/or @mitchellh a chance to respond since they know Terraform (and its roadmap) the best and are likely to give more detailed feedback. Of course, that doesn't mean you can't dig in and start learning about Terraform core. :grinning:

apparentlymart avatar Aug 22 '16 21:08 apparentlymart

don't worry, just want to start learning about terraform core internals. Did I sound like I will go of building yet? 😅

nicolai86 avatar Aug 23 '16 04:08 nicolai86

Reflecting on this a while later...

At work we took the approach of running Terraform on a host within the network it's being deployed to, and running it with an automation tool.

This has been working out really well for us:

  • The problem being described here is moot, because there's no bastion wall between Terraform and the services it's trying to configure.
  • The hole we had to poke to allow our build system to trigger a deployment is small: it's just triggering a job in the automation tool we use via a very well-defined API. The only thing possible to do remotely (via a secure channel) is to tell the Terraform "deploy worker" machine to deploy the Terraform configuration at the HEAD of our git repo, and so we don't need to expose SSH access to these Terraform machines, bastion or otherwise.
  • It encourages other good practices around running Terraform in a very specific environment that's managed by configuration management, which prevents weird little issues caused by running Terraform on different OSes and different machines.
  • Terraform can obtain auth credentials it needs from a credential store within the environment, so the credentials never need to appear on any machine outside of the walls of the target network nor be known directly by any human operator. (We currently do this with a home-grown wrapper script that sets environment variables, rather than with Terraform itself.)

So with all of that said, while it'd be great to have a feature like what was proposed here in the long run so that Terraform can be flexible to run in a variety of different environments, in the short term I'd wholeheartedly recommend that folks consider this alternative approach which has worked out very well for us.

AFAIK such a setup is not possible with Atlas today, in which case I would also suggest that it would be a great feature to be able to use the Atlas UI to control "agents" running within a private network over a secure channel as an alternative to running Terraform on Hashicorp-run infrastructure, which would then enable the above configuration with Atlas as the orchestration tool.

apparentlymart avatar Sep 29 '16 01:09 apparentlymart

I think running Terraform on a server within the VPC is a nice work around for this problem but it has a bootstrapping issue. Where does this server come from initially? Terraform. It means admitting that you have to split your infrastructure management and cannot stand the entire thing up with one run of Terraform.

I also have multiple VPCs that are managed from one Terraform source repository. Applying changes now involves connecting to multiple Terraform nodes and running the updates. And splitting the code out.

All of that is possible, and I can even automate with Fabric or Bash but I don't like adding more tools when Terraform is supposed to be the tool. Also I'm layering scripted automation on top of my very nice declarative automation which just makes me feel a little gross.

For me, I added the SSH tunnel step to a plan and apply shell wrapper for now.

cakeface avatar Dec 21 '16 16:12 cakeface

Yes, it is the case that we had to bootstrap the environment from outside and that there is one Terraform config that requires custom effort to apply because it affects the deploy workers themselves. A temporary extra machine booted manually from the same AMI as the deploy workers addresses that problem, but I certainly won't claim that this is super convenient. It's just a compromise that we tolerate because we apply this configuration relatively infrequently compared to the others that deal with our applications themselves.

apparentlymart avatar Dec 22 '16 21:12 apparentlymart

Hi guys, just wanted to add my 5 cent and try to revive this topic. From my perspective to move the tunnel out of the provider looks smart, but has a severe disadvantage. If you have a remote exec or a file copy the ssh connection is closed after that, so nothing to clean. It just simple exits. Even if terraform crashes. If you would implement a tunnel this way.

port_forward "consul_tunnel" {
    remote_host = "demo.consul.io"
    remote_port = 80

    connection {
        # Now the connection block is the same as in other contexts, as long
        # as the selected connection type supports port forwarding.
        type = "ssh"
        user = "private-user"
        host = "private.jump-host.io"
    }
}

You need a destructor in the code that can also be triggered. So from this logic extend the existing connection and add it to certain providers or ressources would be the more safer route to go.

hingstarne avatar Mar 13 '17 19:03 hingstarne

Any progress on this? We have to open an SSH tunnel every time we run terraform as it manages our RDS instances that are private only.

PLaRoche avatar Mar 14 '17 14:03 PLaRoche

This is a major blocker for us as well.

matelang avatar Mar 29 '17 13:03 matelang

What we are thinking as a workaround, but of course doesn't help all, is to use a kubernetes job to run terraform plan/apply.

As it runs in the cluster, it has access to the private resources, and it's easy to run for all (using a web interface for kubernetes) without needing manually setup tunnels, credentials for those and all. And the idea is to use a remote tfstate on S3 (or something else).

I'll update if we have the time to go more on this path. But, of course, will only help people also running kubernetes clusters :)

rata avatar Mar 29 '17 14:03 rata

i mostly just (right now) want to be able to provision a vm with docker and forward the docker.sock so that terraform can deploy containers onto it without having to set up tcp listener (because i won't want it later anyway.)

automaticgiant avatar May 09 '17 18:05 automaticgiant

Any progress on this? It's almost a year now… The mentioned Terraform gurus were asked for an opinion but didn't reply. Is this issue abandoned?

Bastion hosts are quite common, and relying on external scripts to create an SSH tunnel before Terraform can operate sucks, makes the whole process way more complicated since there are more steps that you must remember of, makes your project far more difficult to maintain if you have multiple resources that require such feature (Redis, MySQL, ElasticSearch, Consul, …), and can be very dangerous if you're working with multiple environments (it's kinda easy to launch terraform apply on dev when you still have your tunnel pointing to production database, and vice versa). I definitely can't see why this issue is considered so low priority?

fquffio avatar Jul 01 '17 05:07 fquffio

Hi @fquffio!

Before I respond I should explain that at the time of my last comments I was an outside open source contributor, but in the meantime I've become a HashiCorp employee working on Terraform.

It is not that this issue is considered low priority, but rather that there are many issues that are all considered important. There remains design work to do to figure out exactly how this will work, and then non-trivial implementation work to get it actually done.

Believe me that I really want to see this feature too, and we'll get there. We're working through the feature request backlog as fast as we can while also keeping up with bug fixes, etc. I understand the frustration and I can only ask for continued patience.

At this time, my hope is to move forward with a configuration structure somewhat like the following, taken from my comment above:

port_forward "consul_tunnel" {
    target_host = "demo.consul.io"
    target_port = 80

    connection {
        # Now the connection block is the same as in other contexts, as long
        # as the selected connection type supports port forwarding.
        type = "ssh"
        user = "private-user"
        host = "private.jump-host.io"
    }
}

provider "consul" {
    address = "${port_forward.consul_tunnel.local_address}"
}

It'll take a little more prototyping to figure out the details of this, such as how we can wire the connection creation and shutdown into the graph, whether the existing connection mechanism can be extended to support tunnels in this way, etc. We'll have more to say here when we are able to complete that prototyping.

apparentlymart avatar Jul 01 '17 18:07 apparentlymart

I'm also interested in this and suggest something along these lines: using a connection block inside the provider:

provider "consul" {
  address = "${aws_route53_record.elb_consul.fqdn}"
  datacenter = "dc1"

  connection {
    type = "tunnel"
    host = "${aws_instance.bastion_1.public_ip}"
    port = "8500"
    private_key = "${file("${var.local_ssh_key_path}")}"
    user = "${var.ssh_user}"
  } 
}

spanktar avatar Aug 04 '17 16:08 spanktar

While I like this approach I think it is sensible for long term, I have to wonder if it would not be easier to get bastion support as it exists today with aws_instance, and other resources added to resources like postgres_database, etc so that people can start using it today.

Either way, I'm a big +1 for supporting bastion hosts on more resources.

ekristen avatar Aug 15 '17 16:08 ekristen

+1 I think this same pattern could be good for supporting VPN access to resources. Having the ssh tunnel be a resource which depends on other resources (Like the bastion instance for example) would solve any ordering issues on first run.

madmod avatar Aug 24 '17 23:08 madmod

@apparentlymart it is time to fix this. You have been dancing around the issue for too long. Either fix it or close it but you have kept us waiting for too long.

cloudvant avatar Aug 29 '17 02:08 cloudvant

@vmendoza That comment seems a little out of line for a free open source project. If you feel so strongly about it... dig in and write some code.

nbering avatar Aug 29 '17 02:08 nbering

I would also request that if/when this is implemented there be a remote_command portion. I specifically want to forward a port to a service that I want to launch as I make the connection.

ssh -L my_port:target_port host some_service_providing_access_on_target_port

kwerle avatar Sep 01 '17 21:09 kwerle

Hello everyone. I decided to try to tackle this myself by building a custom provider. And I'm happy to say that I'm quite pleased with the result. It works by declaring a data source (basically, what you want is a local port that you want to be forwarded somewhere via SSH).

While I am sure there are many things that can be improved, what is great about my solution is that it is usable right now.

I'd like to invite everyone who is having this issue to try it out. Here's the repository: https://github.com/stefansundin/terraform-provider-ssh

Please be careful and do not use in production quite yet. If it breaks something you can keep both pieces. :)

As always, suggestions for improvements are welcome! Thanks all!

stefansundin avatar Nov 25 '17 07:11 stefansundin

@stefansundin nice! but there is an issue: tunnel is not recreated on apply - stefansundin/terraform-provider-ssh#1

jaymecd avatar Nov 28 '17 11:11 jaymecd

Hey @apparentlymart. I've been trying to figure out the issue that @jaymecd reported, but I couldn't find any good solution. Any chance you could take a quick look and say whether or not it is even solvable (or impossible as of right now). There is more info here: https://github.com/stefansundin/terraform-provider-ssh/issues/1

Thanks!

stefansundin avatar Dec 04 '17 00:12 stefansundin

A simple solution to this and similar issues is to provide an option to use a local OpenSSH client binary instead of Go's native ssh implementation. This would allow us to use ProxyCommand to create whatever kind of tunneling we need. See #4523.

I think the developers of Docker Machine got this one right - they use the local 'ssh' binary if present and only fall back on the native Go crypto/ssh implementation when no binary is available (or is explicitly requested - see https://docs.docker.com/machine/reference/ssh/).

OpenSSH is ubiquitous and highly configurable - is there really any benefit in attempting to re-implement some of its features in Terraform?

dangregorysony avatar Dec 07 '17 14:12 dangregorysony

Checking back in on this. Would love to see this someday!

Also, I can confirm the plugin works great as a stopgap measure until native support is added! Great work @stefansundin!

spanktar avatar Feb 23 '18 18:02 spanktar

If you want to use standard tools like curl with a SOCKS5h proxy via SSH tunnel, then you are in luck!

I've found a working solution for docker containers to access services via socks5h://. See my comment & diagram in issue: #17754

This works with local-exec provider! An example use case was that I needed to bootstrap Vault Server CA Certificates for use later in other Terraform resources. However, this Vault server was only accessible inside our secure VPC behind a bastion host. Additionally, it was a private hosted Route53 zone that is only resolvable from within the VPC, so the SOCKS5+h type protocol was important for DNS to resolve!

For example:

# Note: 172.16.222.111 is the alias IP for the host laptop running terraform in docker container
# See diagram in issue comment for #17754 above for clarification!
resource "null_resource" "vault-web-ca" {
  triggers {
    id = "${uuid()}"
  }
  provisioner "local-exec" {
    command = <<EOF
      ALL_PROXY="socks5h://172.16.222.111:${var.socks_proxy_port}";
      HTTP_PROXY="$${ALL_PROXY}";
      HTTPS_PROXY="$${ALL_PROXY}";
      export ALL_PROXY HTTP_PROXY HTTPS_PROXY;
      echo '${data.aws_ssm_parameter.vault-ca-crt.value}' > /tmp/vault-ca.crt && \
      sync && \
      curl -s -k -o - https://vault-${var.env}.${local.private_dns_zone_name}/v1/ca/web/ca_chain > ${path.module}/generated/vault-web-ca-chain.crt && \
      curl -s -k /tmp/vault-ca.crt -o - https://vault-${var.env}.${local.private_dns_zone_name}/v1/ca/web/ca/pem > ${path.module}/generated/vault-web-ca.crt && \
      sync
EOF
  }
}

# Now we can read in these generated cert files and use them later in Terraform
data "local_file" "vault-web-ca-chain" {
  depends_on = ["null_resource.vault-web-ca"]
  filename = "${path.module}/generated/vault-web-ca-chain.crt"
}

data "local_file" "vault-web-ca" {
  depends_on = ["null_resource.vault-web-ca"]
  filename = "${path.module}/generated/vault-web-ca.crt"
}

The current problem is that Terraform itself does not support socks5h://. This is possibly due to an upstream bug in Golang regarding socks5h:// support in x/net/proxy (golang/go#13454). If this is ever fixed, perhaps Terraform providers and code that uses standard x/net/proxy library will just work!

trinitronx avatar Apr 20 '18 22:04 trinitronx

https://go-review.googlesource.com/c/net/+/156517/ was merged

gliptak avatar Mar 06 '19 19:03 gliptak

Here's my workaround for this, in case it helps. I'm using Terragrunt so here's the terragrunt.hcl, a hook is enought to keep it in the workflow without changing the habits.

Note, I tried various combination of nohup, (fork), ((double fork)), &, only screen did the trick.

include {
  path = find_in_parent_folders()
}

terraform {
  # source = "git::[email protected]:terraform-aws-modules/terraform-aws-rds.git//modules/db_instance?ref=v2.5.0"
  source = "."

  before_hook "open_tunnel_through_bastion" {
    commands     = ["plan", "apply", "show", "destroy"]
    execute      = ["screen", "-d", "-m", "ssh", "-L", "12345:${dependency.instance.outputs.this_db_instance_address}:${dependency.instance.outputs.this_db_instance_port}", dependency.bastion.outputs.hostname, "sleep", "60"]
  }
}

dependency "bastion" {
  config_path = "../../../bastion/"
  mock_outputs = {
    hostname = "localhost"
  }
}

dependency "instance" {
  config_path = "../../instance/"
  mock_outputs = {
    this_db_instance_address  = "localhost"
    this_db_instance_port     = 12345
    this_db_instance_username = "mockup_user"
  }
}

inputs = {
  host = "localhost"
  port = "12345"

  postgres_user     = dependency.instance.outputs.this_db_instance_username
  postgres_password = "REDACTED"

  db_name       = "REDACTED"
  db_password   = "REDACTED"
  db_extensions = ["uuid-ossp", "pgcrypto"]
}

Note : at some point, don't remember why exactly, I had to move the execute command to a script and call it from the execute

rgarrigue avatar Oct 16 '19 07:10 rgarrigue