jumppad icon indicating copy to clipboard operation
jumppad copied to clipboard

Remote Connections

Open nicholasjackson opened this issue 4 years ago • 6 comments

Problem: When working on a micro service developers often require the local version of the service to be connected to a full environment. While Shipyard provides some level of capability, spinning up large and complex applications is beyond the hardware capabilities of most development machines.

Generally the approach taken is that the service is deployed to the shared environment and tested externally.

While remote debuggers allow a connection from a local IDE like VSCode to a service running in a remote cluster, there are several problems:

  1. The application code running in the cluster must have debug symbols enabled
  2. The application code running in the cluster must match the source on the local machine
  3. After every change the service must be re-deployed to the remote cluster

Ideally it should be possible to route traffic to a locally running instance of a service from a remote cluster.

It should also be possible to route traffic based on HTTP metadata, etc specifically to the local service without impacting any other element of the shared environment.

Enabling testing One core problem with testing is generally the inputs for the test cases. Test cases are often defined by known issues and functionality. Often bugs exist in application code due to missing functionality which was not defined. Since the test case is directly tied to functionality errors occur. To aid in the discoverability of bugs, a developer should be able to run the latest code with production inputs. This feature would enable the shadowing of traffic from a production environment to a local dev or test environment.

Proposed solution: A new remote network resource which allows traffic to be routed to remote destinations A new local service resource which allows local applications to be part of the shipyard stack

Functional overview A central component (controller) within the remote network would enable tunnelling of connections between the remote dev machine and service traffic in the cluster.

The controller is API driven and runs as a single monolithic instance. When a dev environment connects to the controller it specifies which service it would like to masquerade as.

Specific to a Consul Service Mesh For example we wish to send traffic destined for the remote products-api service to the local dev instance.

The controller will register a "fake" service instance of the products-api service with Consul, the advertise address for the service would be the address of the controller.

Since the Consul cluster would regard the "fake" instance as another service instance it will be able to take part in the service catalog and L7 routing.

The controller would automatically configure L7 routing for the service such as enabling traffic splitting or HTTP meta data based routing. This ensures that the fake service can be isolated from the normal network traffic.

The local client makes an outbound connection to the controller through the Consul Gateway using a valid mTLS certificate. Upon successful connection a persistent TCP connection is opened, this circumvents the need for the local machine to be remotely accessible (public IP, etc)

When traffic is sent controller and the controller will proxy this through the tunnel back to the dev environment.

To connect to the controller the dev environment will use a valid mesh mTLS certificate and use the remote gateway as an ingress.

Example Config

Below is an example of what the configuration might look like. It is assumed that the remote controller would be deployed separately.

// A remote_network defines a network which does not exist within the current machine
// this could be a service mesh such as Consul running on a remote cluster.
remote_network "consul-remote" {
  endpoint = "192.179.231.1"
}

 // make remote services accessible locally
 // the service would be accessible using the FQDN `api-products-db.consul-remote.shipyard:5432`
 // and the port
remote_ingress "product-api" {
  target = "consul-remote" // remote network
  service = "product-api"

  port {
    local  = 4646
    remote = 4646
    host   = 14646
  }
}

// A local service defines an application component which is running on the local machine
// this can be used to proxy requests from the dev stack to local debug code
// if the network target is a remote_network then traffic from a remote cluster will be sent 
// to the local instance 
local_service "dev-products-http" {
  name "api-products" //service name is the registered service name
  network = "consul-remote"

  local_port = 8080

  // routing rules are optional however all elements have a combatinatination (invented a word) effect. 
 // i.e they are combined.
  routing {
    http_header {
      key = "DEBUG"
      value = "nicstest"
    }
    
    // send 10 percent of remote traffic to the local instance
    traffic_split = 10
    
    // a copy of the traffic is sent to the local machine, the original request
    // also arrives at the normal destination, think TEE
    shadow_traffic = true
  }
}

nicholasjackson avatar Feb 05 '20 10:02 nicholasjackson

@nicholasjackson This (local_service) feature would be very powerful. Just this functionality would be enough to sell shipyard to devs.

I need some help imagining how this would work. Would this require a Consul service mesh to be operational?

If so, I think the local_service would also need a local Consul "cluster" (even if 1) to be running that uses Consul gateway mesh to be part of the remote Consul cluster? If this is correct, we might have to think about how this would work if local_service was behind CGNAT. (We dont have to solve the CGNAT problem, if it's a problem at all - rather, mention it as something that could be an obstacle)

gc-ss avatar Jul 08 '21 12:07 gc-ss

We did partially implement this for Kubernetes Resources in Shipyard, I use this feature for developing the new SMI Controller SDK.

https://shipyard.run/docs/resources/ingress#example-expose-a-local-application-as-a-kubernetes-service

It does not need any service mesh to make this work, we might have been overthinking the use cases when writing this. The remote clusters connection we did not implement, I have not found the need for this functionality at the moment, for remote debugging I think a tool like Telepresence could be used.

nicholasjackson avatar Jul 08 '21 13:07 nicholasjackson

I use this feature for developing the new SMI Controller SDK.

Ah, did you mean the ability to expose a port through the connector? Agreed, that's pretty useful by itself.

It does not need any service mesh to make this work, we might have been overthinking the use cases when writing this

Well, when I see the usecase at https://github.com/shipyard-run/shipyard/issues/27#issue-560294890:

local_service "dev-products-http" {
  name "api-products" //service name is the registered service name
  network = "consul-remote"

  local_port = 8080

  // routing rules are optional however all elements have a combatinatination (invented a word) effect. 
 // i.e they are combined.
  routing {
    http_header {
      key = "DEBUG"
      value = "nicstest"
    }
    
    // send 10 percent of remote traffic to the local instance
    traffic_split = 10
    
    // a copy of the traffic is sent to the local machine, the original request
    // also arrives at the normal destination, think TEE
    shadow_traffic = true
  }
}

… the traffic split, mirror/shadow strongly suggests Envoy vibes and these are extremely powerful features to test/develop/debug Microservices.

For example, with the traffic split, mirror/shadow feature I can concurrently test/develop/debug v3 of my Microservice without depending on or touching the v2 of my Microservice that's wired in.

With just the connector, I think, this cannot be achieved (I would have to choose between either v2 or v3 at one time)

gc-ss avatar Jul 08 '21 13:07 gc-ss

I agree with you on the Envoy features, however, if you are just running a local development stack this is not really a problem. You just run the version you are developing.

Running in a shared environment is a different problem and not sure this is a space we want to get into. Since we wrote that RFC, Telepresence has been updated, and rather than duplicate the functionality that it provides I would rather folks just use that.

nicholasjackson avatar Jul 08 '21 14:07 nicholasjackson

Since we wrote that RFC, Telepresence has been updated, and rather than duplicate the functionality that it provides I would rather folks just use that

AH - I am unfamiliar with Telepresence, so need to check it out.

Would Telepresence work well outside a k8s shop? If I am a native Nomad shop, is Telepresence a useful tool?

gc-ss avatar Jul 08 '21 14:07 gc-ss

Ah, sadly not it is predominately a Kubernetes tool.

Maybe we do need to build that shipyard feature after all.

nicholasjackson avatar Jul 08 '21 14:07 nicholasjackson