opentofu icon indicating copy to clipboard operation
opentofu copied to clipboard

Remotely deploying providers (in containers, over SSH, etc)

Open janosdebugs opened this issue 1 year ago • 9 comments

OpenTofu Version

N/A

Use Cases

Currently, providers always run locally with respect to the tofu binary. This limits the usability of OpenTofu in environments where there is a privilege separation and managed credentials are desirable, such as when wanting to use an automatically mounted, short lived service token provided to a Kubernetes pod. This proposal would let OpenTofu deploy the provider over SSH and/or in a container, allowing the provider to use credentials and network environments that are locally available at the remote site.

As a fringe benefit, deploying in containers would also enable using providers written in languages that don't compile to a single static binary, such as Python.

Furthermore, this solution would also enable self-hosting of providers (see opentofu/registry#132) for network-disconnected environments, and the enforcement of security policies on a container registry level, which has ubiquitous tooling available.

Attempted Solutions

Currently, the only solutions are to run OpenTofu on the remote site or via a cloud provider.

Proposal

In addition to running providers locally, OpenTofu should support deploying providers on a remote site via SSH or via an API. A deployer API could be created, which serves as a dumb pipe to the remote site and deploy the provider by uploading the binary or by using a container API. OpenTofu could communicate with providers (or a provider wrapper) over stdio, which is the most commonly available transport method in both SSH and a containerized environment without needing to set up extra firewalls.

Providers could be specifically containerized and therefore, be available for container image caching. For providers that have not published a container image a fallback image would be provided that downloads the provider from the registry on demand.

Note: I have previously implemented this concept in a workflow engine called Arcaflow, which is heavily inspired by Terraform (see here and here). While these libraries were built for a different purpose and I wouldn't use them as dependencies as a result, much of that work can serve as a blueprint.

How the configuration could look like

deployer "kubernetes" "my_kubernetes_cluster" {
  # Defaults to using the local kubeconfig, but should support all Kubernetes options
}

deployer "docker" "my_docker_engine" {
  # Docker-specific config here
}

terraform {
  required_providers {
    kubernetes = {
      deployer = deployer.kubernetes.my_kubernetes_cluster

      # This is an option specific to the deployer:
      image = "my_container_registry/opentofu-kubernetes-provider"
      
      # Alternatively, skip the "image" and specify the normal options to use the fallback container:
      source = "opentofu/terraform"
    }
  }
}

Deployers

SSH

This deployer is intended to run providers on a remote site via SSH. It may include uploading the provider binaries obtained from the registry, or it may require a remote wrapper binary / tofu being installed on the remote site.

Docker/Podman/Kubernetes

These providers would bypass the registry and deploy using container images. The container images would need to be specifically built for this purpose and include either OpenTofu or a small wrapper binary. The default container could include a wrapper binary that downloads from the registry as a fallback mechanism.

Caveats

From an implementation perspective, there are several issues to consider:

  1. Network connections across hosts can be unreliable. Requests may be lost in transit and the deployment may fail. However, this is also the case between the provider and their respective cloud API, so it may not be a problem.
  2. Kubernetes does not have a reliable "start of output" signal, the first line on the output may be eaten. This may not be a problem if the provider doesn't send any data until OpenTofu queries it.
  3. The Docker API uses multiplexing for stdio on non-TTY outputs. This is not complicated, but we need to be aware of it when implementing.
  4. We need to provide a wrapper library for provider authors to simplify creating containerized providers.

References

  • #339
  • opentofu/registry#132

janosdebugs avatar Jan 18 '24 08:01 janosdebugs

This is in line with the [long term] tag, but with something like this we should wait and gauge community interest, as it would introduce significant complexity.

On a related note, while provider <> cloud communication expects network issues and generally handles them, the opentofu <> provider communication framework currently assumes being on localhost, and I believe it doesn't handle network issues.

cube2222 avatar Jan 18 '24 09:01 cube2222

I'm strongly in favor of anything that makes it easier to implement identity federation on k8s.

ImIOImI avatar Jan 18 '24 14:01 ImIOImI

Dear community, please react or comment to this issue if this enhancement would be useful to you. We prioritize development based on community need and your input will help shape our development timeline.

janosdebugs avatar Jan 18 '24 16:01 janosdebugs

I have tried the remote provider you mentioned and it worked.

I constructed a forward proxy and reverse proxy, and run the Provider on the remote end to expose a domain name or IP.

The forward proxy is responsible for receiving RPC requests sent by tofu and forwarding them to the remote domain name or IP.

The reverse proxy is responsible for receiving forwarded requests and sending them to the provider. When the provider is finished processing, the response will be returned to tofu as before.

But this is just a demo. If this needs to be applied in large quantities on Devops, it still has many shortcomings, such as session security, request authentication, whether the Provider is stateful, etc.

Ericwww avatar Jan 19 '24 20:01 Ericwww

Thank you @Ericwww

janosdebugs avatar Jan 19 '24 21:01 janosdebugs

This is a great enhancement for safety, isolation and easy-to-use. As we known, terraform and provider communicate via gPRC to perform resource operations, it is doable to let providers(gRPC server)run remotely and still work But despite the gRPC calls, terrafom also manage the lifecycle of providers using other ways, e.g.

  • using cmd.exec() to lunch a provider
  • using env variables and stdio to exchange handshake message(gRPC server addr, cert for mTLS...)

I think the small wrapper binary mentioned above should be able to take over these abilities to make all the things work.

btw... the unreliable network maybe a big problem,a failed grpc call will make whloe the terraform task fail,but a failed call form provider to cloud API probability will not make it fail(thanks to the retry mechanism)。

chenzejun avatar Jan 29 '24 13:01 chenzejun

I've been doing a bit of digging on this issue and I'm inclined to close it as "not feasible". The main issue is that several thousand lines of code in OpenTofu assume that the provider will be available to run as a local binary. Specifically, the plugin management seems to be hard-wired onto the Meta command and the context.

The closest I got was to create an override that lets a user specify a custom executable path to a plugin which would let users specify an external executable to wrap the provider. However, the first API call is GetProviderSchema, which doesn't contain the information on the plugin name that's being called. Alternatively, one could implement the provider and provisioner factories (see internal/providers/factory.go and internal/provisioners/factory.go for a custom protocol, but all calling paths are hard-coded to use the GRPCProvider with a specific configuration from command/plugin.go. Even if we got it working by overriding things, OpenTofu would still try to download a plugin when tofu init is ran, which is counterproductive if we want to support non-Go providers.

At any rate, without a large-scale refactoring of the command package this cannot be implemented cleanly.

janosdebugs avatar Feb 01 '24 09:02 janosdebugs

Hi @janosdebugs I did some investigation on remote provider and I saw this issue https://github.com/hashicorp/go-plugin/issues/124 mentioned that "A perfectly reliable network is a requirement of this system". It looks like opentofu can only continue if go-plugin supports it first, do we have some plan to do some refactoring of go-plugin?

chenzejun avatar Feb 23 '24 01:02 chenzejun

@chenzejun the same situation arises if the plugin unexpectedly crashes, then the "connection" is also gone. I believe that this is a solvable problem, but one that needs some thought put into it. Ultimately, OpenTofu doesn't need to rely on go-plugin on the calling side, but only inside the container when the shim talks to the provider.

However, the main issue isn't how the network connection will work, that's a relatively simple problem. The main problem is that a large part of the OpenTofu code assumes that:

  1. The provider will always come from a registry.
  2. The provider will always be accessible on the local disk.

These assumptions are spread out across multiple packages that only communicate over a bunch of other packages with each other. There's a hack in place to make provider development possible, but it's just that: a hack. In other words, to make this possible a large-scale refactor would be needed, which we are not ready to do at this time since it has a chance of breaking stuff.

janosdebugs avatar Feb 23 '24 07:02 janosdebugs