zos
zos copied to clipboard
Fqdn gateway with tls passthrough failed to start
After creating a fqdn gateway with the tls_passthrough option set to true and the backend is a vm running a caddy server, the deployment was successful but i was not able to access it from the domain with the following error.
This site can't provide a secure connection
and from caddy i got these errors.
2022/09/07 11:25:52.346 ERROR tls.issuance.acme.acme_client challenge failed {"identifier": "http://fqdncaddy.gridtesting.xyz", "challenge_type": "http-01", "problem": {"type": "urn:ietf:params:acme:error:unauthorized", "title": "", "detail": "185.69.167.81: Invalid response from http://http://fqdncaddy.gridtesting.xyz/.well-known/acme-challenge/DXPVMC20AIcQsD43Gvg58SpyV4N8J7V0MReiT6Qw-L4: 404", "instance": "", "subproblems": []}}
2022/09/07 11:25:52.348 ERROR tls.issuance.acme.acme_client validating authorization {"identifier": "http://fqdncaddy.gridtesting.xyz", "problem": {"type": "urn:ietf:params:acme:error:unauthorized", "title": "", "detail": "185.69.167.81: Invalid response from http://http://fqdncaddy.gridtesting.xyz/.well-known/acme-challenge/DXPVMC20AIcQsD43Gvg58SpyV4N8J7V0MReiT6Qw-L4: 404", "instance": "", "subproblems": []}, "order": "https://acme-v02.api.letsencrypt.org/acme/order/721377777/123320026447", "attempt": 1, "max_attempts": 3}
2022/09/07 11:25:53.669 INFO tls.issuance.acme.acme_client trying to solve challenge {"identifier": "http://fqdncaddy.gridtesting.xyz", "challenge_type": "tls-alpn-01", "ca": "https://acme-v02.api.letsencrypt.org/directory"}
2022/09/07 11:26:04.515 ERROR tls.issuance.acme.acme_client challenge failed {"identifier": "http://fqdncaddy.gridtesting.xyz", "challenge_type": "tls-alpn-01", "problem": {"type": "urn:ietf:params:acme:error:connection", "title": "", "detail": "185.69.167.81: Timeout during read (your server may be slow or overloaded)", "instance": "", "subproblems": []}}
2022/09/07 11:26:04.517 ERROR tls.issuance.acme.acme_client validating authorization {"identifier": "http://fqdncaddy.gridtesting.xyz", "problem": {"type": "urn:ietf:params:acme:error:connection", "title": "", "detail": "185.69.167.81: Timeout during read (your server may be slow or overloaded)", "instance": "", "subproblems": []}, "order": "https://acme-v02.api.letsencrypt.org/acme/order/721377777/123320225777", "attempt": 2, "max_attempts": 3}
2022/09/07 11:26:04.517 ERROR tls.obtain could not get certificate from issuer {"identifier": "http://fqdncaddy.gridtesting.xyz", "issuer": "acme-v02.api.letsencrypt.org-directory", "error": "HTTP 400 urn:ietf:params:acme:error:connection - 185.69.167.81: Timeout during read (your server may be slow or overloaded)"}
The VM & gateway were deployed on node 3 qanet and the created domain was pointing at the ip of the gateway


I think it's either caddy is not the service that is not listening on port 80 and 443 OR it's miss-configured. While the Ip and address can be reached, but this looks wrong

This is the used configuration
gw.node_id = 3;
gw.fqdn = "fqdncaddy.gridtesting.xyz";
gw.tls_passthrough = true;
gw.backends = ["https://185.69.167.88"];
and it was deployed through grid3_client_ts from the fqdn_gateway.ts script which can be found here.
I think it's the same issue as #1591
I'm not sure how the requests coming over port 80 would be forwarded to the vm as the only allow scheme is https(when the tls_passthrough = true) for backends.
May i ask what is the point of using the gateway when u already have a public ip assigned to your backend? The idea of gateways is to expose hidden workloads that are only available over yggdrasil for example.
In your specific use case i think the best solution (and the most cost efficient) is to directly set your domain to point to your VM public IP directly since there is no need for the gateway in this setup.
May i ask what is the point of using the gateway when u already have a public ip assigned to your backend? The idea of gateways is to expose hidden workloads that are only available over yggdrasil for example.
In your specific use case i think the best solution (and the most cost efficient) is to directly set your domain to point to your VM public IP directly since there is no need for the gateway in this setup.
In my case i was using it with a public ip to verify that the tls_passthrough option was working, it might not be a valid setup in a real scenario as you mentioned above nor would it be cost efficient but as i mentioned before i was just using this setup to verify a test case.
I tested the same case again on devnet and it didn't work and i got the following errors from caddy.
2022/10/26 16:11:13.210 ERROR http.acme_client challenge failed {"identifier": "tls.fqdn.test.gridtesting.xyz", "challenge_type": "http-01", "problem": {"type": "urn:ietf:params:acme:error:unauthorized", "title": "", "detail": "185.206.122.31: Invalid response from http://tls.fqdn.test.gridtesting.xyz/.well-known/acme-challenge/NluvRrXQxtQ8Sj2rT9OP_Z33OajOV5gsmXaHr5uoZyM: 404", "instance": "", "subproblems": []}}
2022/10/26 16:11:13.213 ERROR http.acme_client validating authorization {"identifier": "tls.fqdn.test.gridtesting.xyz", "problem": {"type": "urn:ietf:params:acme:error:unauthorized", "title": "", "detail": "185.206.122.31: Invalid response from http://tls.fqdn.test.gridtesting.xyz/.well-known/acme-challenge/NluvRrXQxtQ8Sj2rT9OP_Z33OajOV5gsmXaHr5uoZyM: 404", "instance": "", "subproblems": []}, "order": "https://acme-v02.api.letsencrypt.org/acme/order/795941097/138168287667", "attempt": 1, "max_attempts": 3}
2022/10/26 16:11:14.565 INFO http.acme_client trying to solve challenge {"identifier": "tls.fqdn.test.gridtesting.xyz", "challenge_type": "tls-alpn-01", "ca": "https://acme-v02.api.letsencrypt.org/directory"}
2022/10/26 16:11:25.027 ERROR http.acme_client challenge failed {"identifier": "tls.fqdn.test.gridtesting.xyz", "challenge_type": "tls-alpn-01", "problem": {"type": "urn:ietf:params:acme:error:connection", "title": "", "detail": "185.206.122.31: Timeout during read (your server may be slow or overloaded)", "instance": "", "subproblems": []}}
2022/10/26 16:11:25.029 ERROR http.acme_client validating authorization {"identifier": "tls.fqdn.test.gridtesting.xyz", "problem": {"type": "urn:ietf:params:acme:error:connection", "title": "", "detail": "185.206.122.31: Timeout during read (your server may be slow or overloaded)", "instance": "", "subproblems": []}, "order": "https://acme-v02.api.letsencrypt.org/acme/order/795941097/138168557237", "attempt": 2, "max_attempts": 3}
2022/10/26 16:11:25.031 ERROR tls.obtain could not get certificate from issuer {"identifier": "tls.fqdn.test.gridtesting.xyz", "issuer": "acme-v02.api.letsencrypt.org-directory", "error": "HTTP 400 urn:ietf:params:acme:error:connection - 185.206.122.31: Timeout during read (your server may be slow or overloaded)"}
- Used Config
gw.name = "applyFQDN";
gw.node_id = 14;
gw.fqdn = "tls.fqdn.test.gridtesting.xyz";
gw.tls_passthrough = true;
gw.backends = ["https://[302:9e63:7d43:b742:c38:d2b6:67c7:e07b]"];
- Followed Steps
- Deployed VM and installed caddy on it.
- Created a domain that points at the IP of
Node 14on devnet. - Ran python3 http server on the VM and Caddy reverse-proxy using this command
caddy reverse-proxy --from https://tls.fqdn.test.gridtesting.xyz --to :8000 - Deployed a FQDN gateway with the config mentioned above.
I'm on it.
first, I wanted to quickly verify if tls_passthrough option was working, given the tools I am more familiar with.
1 - deployed VM from the playground on node 45
{
"version": 0,
"contractId": 14187,
"nodeId": 45,
"name": "VM043dfa70",
"created": 1668599374,
"status": "ok",
"message": "",
"flist": "https://hub.grid.tf/tf-official-vms/ubuntu-22.04.flist",
"publicIP": {
"ip": "162.205.240.230/25",
"ip6": "",
"gateway": "162.205.240.254"
},
"planetary": "300:9bd8:6f95:d606:704c:8403:b323:b225",
"interfaces": [
{
"network": "NW74874c45",
"ip": "10.20.2.2"
}
],
"capacity": {
"cpu": 4,
"memory": 8192
},
"mounts": [
{
"name": "DISKda57d841",
"mountPoint": "/",
"size": 53687091200,
"state": "ok",
"message": ""
}
],
"env": {
"SSH_KEY": "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQC9MI7fh4xEOOEKL7PvLvXmSeRWesToj6E26bbDASvlZnyzlSKFLuYRpnVjkr8JcuWKZP6RQn8+2aRs6Owyx7Tx+9kmEh7WI5fol0JNDn1D0gjp4XtGnqnON7d0d5oFI+EjQQwgCZwvg0PnV/2DYoH4GJ6KPCclPz4a6eXrblCLA2CHTzghDgyj2x5B4vB3rtoI/GAYYNqxB7REngOG6hct8vdtSndeY1sxuRoBnophf7MPHklRQ6EG2GxQVzAOsBgGHWSJPsXQkxbs8am0C9uEDL+BJuSyFbc/fSRKptU1UmS18kdEjRgGNoQD7D+Maxh1EbmudYqKW92TVgdxXWTQv1b1+3dG5+9g+hIWkbKZCBcfMe4nA5H7qerLvoFWLl6dKhayt1xx5mv8XhXCpEC22/XHxhRBHBaWwSSI+QPOCvs4cdrn4sQU+EXsy7+T7FIXPeWiC2jhFd6j8WIHAv6/rRPsiwV1dobzZOrCxTOnrqPB+756t7ANxuktsVlAZaM= sameh@sameh-inspiron-3576"
},
"entrypoint": "/init.sh",
"metadata": "{\"type\":\"vm\",\"name\":\"VM043dfa70\",\"projectName\":\"Fullvm\"}",
"description": "",
"corex": false
}
2 - I configure a domain DNS record that points to the IP address of the node 45 gateway IP address.

3 - I deployed the gateway using terraform, here is the config
terraform {
required_providers {
grid = {
source = "threefoldtech/grid"
}
}
}
provider "grid" {
}
resource "grid_fqdn_proxy" "p1" {
node = 45
name = "workloadname"
fqdn = "tlstest.grid.tf"
backends = [format("https://162.205.240.230:8080")]
tls_passthrough = true
}
output "fqdn" {
value = grid_fqdn_proxy.p1.fqdn
}
4 - finally I generate a cert for the domain "tlstest.grid.tf" using mkcert and luncehd a static http-server, and successfully accessed https://tlstest.grid.tf/hello.txt (browser may give warning as the cert is self-signed)

I'll try the same setup as @mohamedamer453 to see what could go wrong, but I think it's a caddy configuration although I'm not familiar with it.
I tried to use both caddy on the backend and the ts grid client to deploy the gw, I had a weird issue with the ts client and can't deploy with it.
So I had to use terraform again to deploy the gw (because the ts grid client issue I had). here is the terraform file used to deploy the gw
terraform {
required_providers {
grid = {
source = "threefoldtech/grid"
}
}
}
provider "grid" {
}
resource "grid_fqdn_proxy" "p1" {
node = 45
name = "workloadname"
fqdn = "tlstest.grid.tf"
backends = [format("https://162.205.240.230:443")]
tls_passthrough = true
}
output "fqdn" {
value = grid_fqdn_proxy.p1.fqdn
}
I used this caddy file, from the Caddy quick start
tlstest.grid.tf
respond "Hello, privacy!"
started caddy and it just works, the screenshot shows caddy had provisioned a TLS certificate and served a static site over HTTPS


Caddy required ports 80 and 443 to be open externally to be able to Obtain a publicly-trusted TLS certificate. check here how the acme challenges works.
My understanding of the errors in the screenshot posted by @mohamedamer453 means both the HTTP (This challenge requires port 80) and TLS-ALPN (This challenge requires port 443 to be externally accessible) challenges are falling, and this suggests that gw could be configured incorrectly.
From gw doc
The tls_passthrough parameter determines whether the tls termination happens on the gateway or in the backends. When it's true, the backends must be in the form https://ip:port, and the backends must be https-enabled servers.
@mohamedamer453 can you add explicitly the port number to the script (https:/[ip-address]:[port]), and try again? I tried to mimic your config by using just the IP address although the gw deployment succeeded traffic wasn't forwarded to the 443 port. please report back.
@muhamadazmy I checked the validation code and it seems we are not validating if the port part is included, I think we should validate this at least when tls_passthrough is true.
https://github.com/threefoldtech/zos/blob/main/pkg/gridtypes/zos/gw_fqdn.go
After following the flow mentioned above and adding port 443 it worked and the domain was accessible.
- Firstly i deployed a VM from tsclient and then i installed caddy on this vm.
- Then i created the a domain that points at the ip of the gateway

- Finally i created a fqdn gateway from tsclient with the following options.
gw.name = "applyFQDN"; gw.node_id = 2; gw.fqdn = "caddyfqdn.gridtesting.xyz"; gw.tls_passthrough = true; gw.backends = ["https://185.69.167.85:443"]; - And then i created a caddy file and ran the caddy server on the VM.
- And the domain worked without any issues.

- And from the caddy logs i only got one issue from the
HTTPchallenge since it was not exposed on port80as mentioned above.