nomad
nomad copied to clipboard
Logs through Nomad UI show certificate issues (log-viewing itself works)
Nomad version
Nomad v1.9.0
BuildDate 2024-10-10T07:13:43Z
Revision 7ad36851ec02f875e0814775ecf1df0229f0a615
and
Nomad v1.8.3
BuildDate 2024-08-13T07:37:30Z
Revision 63b636e5cbaca312cf6ea63e040f445f05f00478
(but may not be limited to these)
Operating system and Environment details
Ubuntu 22.04.5 LTS
Nomad with
- ACL enabled
- TLS enabled on http + rpc
- vault enabled
- dev is single region
- staging and production are federated multi-region.
- VIP for a providing a singular entrypoint to the specific environment's Nomad-UI (and deployed nomad jobs/services)
- consul used to generate config and Traefik then automatically routes that accordingly, together with Lets-encrypt gives a smooth TLS experience where developers , together with some CI/CD templating are able to deploy easily applications independently.
- ie, dns wildcard *.nomad-development.company.com and you can easily deploy https://application.nomad-development.company.com (not fit for high-traffic/volume , but for low-traffic apps this works fine)
Issue
when checking the Nomad UI to look at container logs, there are errors reported due to certificate issues.
- my UI is on a Traefik loadbalancing with SSL and LetsEncrypt certs on , example, https://nomad-development.company.com
- however, when requesting logs, these for some reason go to:
https://10.xx.yy.zz:4646/v1/client/fs/logs/8fb55989-139c-5256-f812-d79353993c6c?follow=true&offset=50000&origin=end&task=athena-cleaner&type=stdoutand as the Nomad-Servers are using a private CA, as per Nomad's recommendations
This should be a private CA and not a public one like Let's Encrypt
as any certificate signed by this CA will be allowed to communicate with the cluster
and as such, this shows these certificate errors on Enduser-devices.
note: log viewing still works as apparently that call goes back to the SSL traefik endpoint, example, https://nomad-development.company.com/v1/client/fs/logs/c519c888-6c46-6d8e-2f0c-f5a17be8afc7?follow=true&offset=50000&origin=end&task=google-cadvisor&type=stderr
this flow works
graph TD;
A[Browser]-- OK -->B[https:nomad-development.company.com = VIP+Traefik+LetsEncrypt];
B-- OK -->E[Nomad-Server-1:4646 -- Private CA];
B-- OK -->F[Nomad-Server-2:4646 -- Private CA];
B-- OK -->G[Nomad-Server-3:4646 -- Private CA];
but the UI seems to do this 'direct connection' for the errornous calls, and that fails.
graph TD;
A[Browser];
A-- FAIL -->E[https.Nomad-Server-1:4646 -- Private CA];
A-- FAIL -->F[https.Nomad-Server-2:4646 -- Private CA];
A-- FAIL -->G[https.Nomad-Server-3:4646 -- Private CA];
Reproduction steps
- not specific, just noticed as
- also happens on all environments,
- nomad-development.company.com (
v1.9.0on servers,v1.8.3on clients) - nomad-staging.company.com (
v1.8.3) - nomad-production.company.com (
v1.8.3)
- nomad-development.company.com (
Expected Result
- no certificate errors
Actual Result
GET https://10.x.y.z:4646/v1/client/fs/logs/c519c888-6c46-6d8e-2f0c-f5a17be8afc7?follow=true&offset=50000&origin=end&task=google-cadvisor&type=stderr net::ERR_CERT_AUTHORITY_INVALID
i am not sure if there is some extra config needed in such case that is not readily available yet?
like, consul and vault have some ui_url that can be set:
consul {
ui_url = "https://consul.nomad-development.company.com/ui"
}
vault {
ui_url = "https://vault.nomad-development.company.com/ui"
}
perhaps such a property also is needed in my case, where effectively, nomad-ui sits behind a proxy?
(or some other config I may have overlooked? response rewriting is not exactly the direction I would prefer)
Hi @dmclf, thanks for raising this issue. There is no further agent config for this, the way there is for consul/vault ui_urls. I suspect you've already given https://developer.hashicorp.com/nomad/tutorials/manage-clusters/reverse-proxy-ui a look since you've arrived at a nice environment behind Traefik, but that guide doesn't have any certificate-specific advice anyway.
This is to say: first time I've heard of this particular issue, but not the first time I've seen issues raised around the proxied UI (for example). This could use some further investigation and I will try to set some time to dig in soon.
hi @philrenaud , I guess example issue 6413 sounds a bit like my first version with Fabio, which worked fine, but indeed has its limitations.
I can elaborate more on the environment I setup, but that won't help this specific ticket (but may potentially be nice to know how people setup things? or help others with similar setups)