dataplaneapi icon indicating copy to clipboard operation
dataplaneapi copied to clipboard

MAJOR: service discovery: Add support for Nomad

Open mr-karan opened this issue 2 years ago • 13 comments

Adds support for service discovery via Nomad services.

Nomad recently launched support for registering services for the tasks it's running. It's possible to retrieve the list of services using HTTP API with the Go SDK that it provides. In functionality terms it is very similar to Consul SDK.

The models for Nomad are generated in client-native and I've sent a PR for it: https://github.com/haproxytech/client-native/pull/90

Also fixes an issue with go-generate as discussed here: https://github.com/haproxytech/dataplaneapi/issues/266

mr-karan avatar Sep 15 '22 07:09 mr-karan

To test this locally, the easiest way is to run a local nomad agent (instructions here: https://gist.github.com/mr-karan/b1bb4f65ae31d91985e6a64451b79f6e)

This config file can be used to connect dataplane to Nomad:

service_discovery {
  nomads = [
    {
      Address                    = "http://127.0.0.1"
      Description                = "Nomad test system"
      Enabled                    = true
      ID                         = "b40eb63b-2fb2-4996-b870-20f50ca173de"
      Name                       = "my-nomad-service"
      Namespace                  = "*"
      Port                       = 4646
      RetryTimeout               = 15
      SecretID                   = ""
      ServerSlotsBase            = 10
      ServerSlotsGrowthIncrement = 10
      ServerSlotsGrowthType      = "linear"
    },
  ]
}

A sample Nomad job can be run using nomad job run doggo.nomad

job "doggo" {
  datacenters = ["dc1"]
  type        = "service"
  namespace   = "doggo"

  group "app" {
    count = 2

    network {
      port "http" {
        to = 8080
      }
    }

    task "web" {
      driver = "docker"

      service {
        provider = "nomad"
        name     = "doggo-web"
        tags     = ["doggo", "web"]
        port     = "http"
      }

      config {
        image = "ghcr.io/mr-karan/doggo-api:latest"
        ports = ["http"]
      }

      resources {
        cpu    = 200
        memory = 200
      }
    }
  }
}

When dataplane is started, the following logs show that it discovers the services and triggers a reload after templating HAProxy file:

time="2022-09-15T13:30:58+05:30" level=debug msg="discovery job reconciliation started" ID=b40eb63b-2fb2-4996-b870-20f50ca173de ServiceDiscovery=Nomad
time="2022-09-15T13:30:58+05:30" level=debug msg="discovery job reconciliation completed" ID=b40eb63b-2fb2-4996-b870-20f50ca173de ServiceDiscovery=Nomad
time="2022-09-15T13:31:13+05:30" level=debug msg="discovery job reconciliation started" ID=b40eb63b-2fb2-4996-b870-20f50ca173de ServiceDiscovery=Nomad
time="2022-09-15T13:31:13+05:30" level=debug msg="discovery job reconciliation completed" ID=b40eb63b-2fb2-4996-b870-20f50ca173de ServiceDiscovery=Nomad
time="2022-09-15T13:31:13+05:30" level=debug msg="Scheduling a new reload..." reload_id=2022-09-15-0
time="2022-09-15T13:31:18+05:30" level=debug msg="Reload started" reload_id=2022-09-15-0
time="2022-09-15T13:31:18+05:30" level=debug msg="Reload finished in 11.30711ms" reload_id=2022-09-15-0
time="2022-09-15T13:31:18+05:30" level=debug msg="Reload successful" reload_id=2022-09-15-0
time="2022-09-15T13:31:18+05:30" level=debug msg="Handling reload completed, waiting for new requests" reload_id=2022-09-15-0
time="2022-09-15T13:31:28+05:30" level=debug msg="discovery job reconciliation started" ID=b40eb63b-2fb2-4996-b870-20f50ca173de ServiceDiscovery=Nomad
time="2022-09-15T13:31:28+05:30" level=debug msg="discovery job reconciliation completed" ID=b40eb63b-2fb2-4996-b870-20f50ca173de ServiceDiscovery=Nomad

This is how the HAProxy file looks:

backend nomad-backend-my-nomad-service-doggo-doggo-web
  server SRV_FzVaH 192.168.29.76:31513 check weight 128
  server SRV_2pzcs 192.168.29.76:26146 check weight 128
  server SRV_3C09Z 127.0.0.1:80 check disabled weight 128
  server SRV_1UdFi 127.0.0.1:80 check disabled weight 128
  server SRV_Ck0VD 127.0.0.1:80 disabled weight 128
  server SRV_1QHV6 127.0.0.1:80 disabled weight 128
  server SRV_q0Uwh 127.0.0.1:80 disabled weight 128
  server SRV_YZUkQ 127.0.0.1:80 disabled weight 128
  server SRV_GCFNm 127.0.0.1:80 disabled weight 128
  server SRV_N1JdF 127.0.0.1:80 disabled weight 128

mr-karan avatar Sep 15 '22 08:09 mr-karan

Hi @mr-karan, thank you for the PR and sorry for delay, I will review this and get back to you.

mjuraga avatar Sep 20 '22 08:09 mjuraga

I have a just small comment, can you split the fix to the go-generate stuff into a separate commit to have a cleaner commit history?

mjuraga avatar Sep 21 '22 08:09 mjuraga

I have a just small comment, can you split the fix to the go-generate stuff into a separate commit to have a cleaner commit history?

@mjuraga Done!

mr-karan avatar Sep 21 '22 11:09 mr-karan

Hi, just a small comment, can we have CLEANUP: remove unused go-generate instead of chore and I'll merge it.

mjuraga avatar Sep 21 '22 12:09 mjuraga

@mjuraga fixed :)

mr-karan avatar Sep 21 '22 16:09 mr-karan

@mjuraga Hey, just checking in if there are any blockers for this to get merged :sweat_smile:

mr-karan avatar Sep 30 '22 05:09 mr-karan

Hi @mr-karan, sorry for the delay again, we are currently preparing 2.7 version of the dataplaneapi, and we haven't had the chance to properly test this. Saying that we don't want to merge something like this (a new feature) so close to the release without proper testing.

On the other side, by just looking at this it looks fine, and will probably get merged right after the release once we have a deeper look and test this.

mjuraga avatar Sep 30 '22 10:09 mjuraga

Sure thing, understandable. Thanks for letting me know!

mr-karan avatar Sep 30 '22 11:09 mr-karan

Thank you for the patience, I will come back to this as soon as possible, and let you know.

mjuraga avatar Sep 30 '22 11:09 mjuraga

@mjuraga Hi! Possible to share an update here? ^_^

mr-karan avatar Jun 22 '23 07:06 mr-karan

Hi @mr-karan sorry for the too long delay, we've been busy with the latest release. I think your PR is OK, but we had to remove all usages of hashicorp libraries, like this here: https://github.com/haproxytech/dataplaneapi/pull/267/files#diff-01f037ab922981e5205173ea883804d526580df1e51a7ac2e45028dd20d1dc27

Unfortunately there is a licensing conflict between our project and their license so we needed to remove it. We can still consider your PR, but we would need to remove this import here. For example, you can check how we achieved that for consul service discovery support here: https://github.com/haproxytech/dataplaneapi/commit/4ad31367cc1a2a3a2d815d45a8303263dd772d21

Would you still be willing to work on this PR?

mjuraga avatar Jun 26 '23 08:06 mjuraga

@mjuraga Noted. Yeah that makes sense! I'll add this to my backlog for now but eager to contribute! :sweat_smile:

mr-karan avatar Jun 27 '23 03:06 mr-karan