nomad icon indicating copy to clipboard operation
nomad copied to clipboard

API/Inspect loses ConsulConnect params

Open pserranoa opened this issue 4 years ago • 10 comments
trafficstars

Nomad version

1.0.6

Operating system and Environment details

NAME="Red Hat Enterprise Linux" VERSION="8.2 (Ootpa)" ID="rhel" ID_LIKE="fedora" VERSION_ID="8.2" PLATFORM_ID="platform:el8"

Issue

Some parameters of nomad job where missed after deploy a job with connect integration when you use the ui/api with json to make changes

Reproduction steps

deploy a job that expose something like this

connect {
          sidecar_service {
            proxy {
              expose {
                path {
                  path            = "/metrics"
                  protocol        = "http"
                  local_path_port = 8085
                  listener_port   = "api_expose_metrics"
                }
              }
            }
          }
        }

When the job is correctly running, go to definition of job, click on edit and plan without changes, you will receive a message with some changes to apply

+/- Job: "my_job"
+/- Task Group: "my_group" ( 1 in-place update )
+/- Service {
    AddressMode: "auto"
    EnableTagOverride: "false"
    Name: "openidm-my_job-service"
    PortLabel: "api"
    TaskName: "my_task"
    Tags {
      Tags: "metrics"
    }
    +/- ConsulConnect {
      Native: "false"
      +/- SidecarService {
        Port: ""
      +/- ConsulProxy {
        LocalServiceAddress: ""
        LocalServicePort: "0"
        }
      }
    }
  }
Task: "connect-proxy-my_job-service"
Task: my_task"

Expected Result

If you use the deinition at UI, click edit and make a plan you should receive a message like

Job: "my_job"
Task Group: "my_group" (1 in-place update)
  Task: "connect-proxy-my_job-service"
  Task: "my_task"

Scheduler dry-run:
- All tasks successfully allocated.

Actual Result

+/- Job: "my_job"
+/- Task Group: "my_group" ( 1 in-place update )
+/- Service {
    AddressMode: "auto"
    EnableTagOverride: "false"
    Name: "openidm-my_job-service"
    PortLabel: "api"
    TaskName: "my_task"
    Tags {
      Tags: "metrics"
    }
    +/- ConsulConnect {
      Native: "false"
      +/- SidecarService {
        Port: ""
      +/- ConsulProxy {
        LocalServiceAddress: ""
        LocalServicePort: "0"
        }
      }
    }
  }
Task: "connect-proxy-my_job-service"
Task: my_task"

pserranoa avatar Oct 13 '21 10:10 pserranoa

Thanks for logging @pserranoa, I am experiencing the exact same issue.

Versions where we are experiencing it:

OS: Ubuntu 18.04 nomad.version: 1.1.5 consul.version: 1.10.3 envoy.version: 1.18.4

CarelvanHeerden avatar Nov 05 '21 07:11 CarelvanHeerden

Loosing our prometheus metrics every time we make a change to the job config in the UI is annoying. After testing a bit I believe its related to the service check expose setting...

I've not included all HCL and JSON for brevity, the ... indicates excluded content. This is a sample our original job definition

job "job1" {
  group "job1" {
...
    service {
      name = "job1"
      port = 8080
      check {
        type = "http"
        path = "/ping"
        interval = "10s"
        timeout = "2s"
        expose = true
      }
      connect {
        sidecar_service {
          proxy {
            expose {
              path {
                path = "/metrics"
                protocol = "http"
                local_path_port = 8080
                listener_port = "expose"
              }
            }
          }
        }
      }
    }
...
  }
}

as it automatically generates the Services.Connect.SidecarService.Proxy.Expose in the JSON

...
          "Checks": [
            {
              "Name": "service: \"transactions\" check",
              "Type": "http",
              "Command": "",
              "Args": null,
              "Path": "/ping",
              "Protocol": "",
              "PortLabel": "svc_job1_ck_22267b",
...
            }
          ],
          "Connect": {
            "Native": false,
            "SidecarService": {
              "Tags": null,
              "Port": "",
              "Proxy": {
...
                "Expose": {
                  "Paths": [
                    {
                      "Path": "/metrics",
                      "Protocol": "http",
                      "LocalPathPort": 8080,
                      "ListenerPort": "expose"
                    },
                    {
                      "Path": "/ping",
                      "Protocol": "",
                      "LocalPathPort": 8080,
                      "ListenerPort": "svc_job1_ck_22267b"
                    }
                  ]
                },
...

When we make a change from the UI (in the JSON) then afterwards the result shows the /metics expose is missing:

...
                "Expose": {
                  "Paths": [
                    {
                      "Path": "/ping",
                      "Protocol": "",
                      "LocalPathPort": 8080,
                      "ListenerPort": "svc_job_ck_22267b"
                    }
                  ]
                },
...

If I disable the the expose on theServices.Checks.Expose = false then the exposed paths disappear, even though the json had the list (Services.Connect.SidecarService.Proxy.Expose)

...
                "Expose": null,
...

The work around is to update the job using the HCL that we used to originally to create the job. but this is annoying as we cant make changes to our job in the UI.

grahambrown11 avatar Nov 05 '21 08:11 grahambrown11

Hello @tgross and Happy New Year. Any progress on this issue?

CarelvanHeerden avatar Jan 06 '22 06:01 CarelvanHeerden

Hi @pserranoa and thanks for raising this issue as well as @grahambrown11 and @CarelvanHeerden for the additional context. I will raise this internally to see if we can get this roadmapped for a fix in the short-term.

jrasell avatar Jan 06 '22 11:01 jrasell

Hi @jrasell. Thanks for your support

pserranoa avatar Jan 07 '22 06:01 pserranoa

Hi @jrasell, any chance of progress being made on this?

martinffx avatar Aug 17 '22 10:08 martinffx

Hi @jrasell, any plans on fixing this? I think I can try and fix it if no movement from hashicorp directly. This seems the same as https://github.com/hashicorp/nomad/issues/9379

jorgemarey avatar Dec 12 '22 08:12 jorgemarey

Hi again,

Changing this (and all references to those fields) fixes the issue for me.

diff --git a/nomad/structs/services.go b/nomad/structs/services.go
index 19e78b08b7..db790979d9 100644
--- a/nomad/structs/services.go
+++ b/nomad/structs/services.go
@@ -1358,9 +1358,7 @@ type ConsulProxy struct {
 
 	// Expose configures the consul proxy.expose stanza to "open up" endpoints
 	// used by task-group level service checks using HTTP or gRPC protocols.
-	//
-	// Use json tag to match with field name in api/
-	Expose *ConsulExposeConfig `json:"ExposeConfig"`
+	ExposeConfig *ConsulExposeConfig
 
 	// Config is a proxy configuration. It is opaque to Nomad and passed
 	// directly to Consul.
@@ -1501,7 +1499,7 @@ func upstreamsEquals(a, b []ConsulUpstream) bool {
 // ConsulExposeConfig represents a Consul Connect expose jobspec stanza.
 type ConsulExposeConfig struct {
 	// Use json tag to match with field name in api/
-	Paths []ConsulExposePath `json:"Path"`
+	Path []ConsulExposePath
 }

jorgemarey avatar Dec 12 '22 11:12 jorgemarey

Hi @martinffx and @jorgemarey; this issue is on our backlog but has not yet been picked up. I can't currently give an estimate of when it will be started and my own current workload and stream means I don't have the time to look. If you do have a suggested fix, a PR would be grand and we can help review and work through any items.

jrasell avatar Dec 12 '22 11:12 jrasell

Hi @jrasell. I made a PR in https://github.com/hashicorp/nomad/pull/15541.

jorgemarey avatar Dec 13 '22 09:12 jorgemarey

+1 for this, we've run into this, lost a fair bit of time debugging it and it's a pretty big issue for us as our new metrics setup relies on it. Nomad surreptitiously dropping bits from an active job configuration seems like something that should get a bit more attention so we'd really appreciate some traction on this if possible as it's been sitting around for 1yr+ now

thisisjaid avatar Jan 26 '23 10:01 thisisjaid