dstack icon indicating copy to clipboard operation
dstack copied to clipboard

Unclear error message `Gateway is not working` if DNS is misconfigured

Open jvstme opened this issue 1 year ago • 3 comments

Steps to reproduce

  1. Start dstack server with ZeroSSL configured as the CA for dstack-gateway. See this comment.
  2. Create a gateway
    dstack gateway create --domain $DOMAIN --region eu-central-1 --backend aws
    
  3. Set a DNS A record for *.$DOMAIN, but instead of pointing it to the gateway's IP address point it to an IP address of some other machine that is down. As if you redeployed the gateway, but forgot to change the DNS record.
  4. Try running any service with dstack
    > cat drope.yml
    type: service
    
    commands:
      - pip install drope
      - drope
    port: 8000
    
    > dstack run . -f drope.yml 
    ... (redacted for brevity) ...
     Shown 3 of 761 offers, $49.159 max
    
    Continue? [y/n]: y
    

Expected behaviour

The CLI shows an error saying that dstack-gateway failed to issue a certificate for the service's domain and suggests the user to make sure the DNS A record points to the domain.

Actual behaviour

After 30 seconds the CLI shows an unclear error message.

Gateway is not working: 

The server logs don't have anything relevant.

dstack version

0.17.0

Server logs

No response

Additional information

What happens is:

  • the server requests gateway's /api/registry/{project}/services/register
  • the gateway tries issuing a certificate via certbot
  • certbot hangs indefinitely because of misconfigured DNS
  • the server cancels its request after a timeout

This behavior depends on the CA. E.g. with Let's Encrypt certbot exits quickly and the error is passed to dstack server and then to the CLI.

GatewayError: Certbot failed:
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Some challenges have failed.

I suggest we fix this by adding a timeout to certbot runs and passing a clear error message to the CLI if the timeout is reached.

jvstme avatar Mar 28 '24 18:03 jvstme

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Apr 28 '24 01:04 peterschmidt85

This issue was closed because it has been inactive for 14 days since being marked as stale.

peterschmidt85 avatar May 12 '24 01:05 peterschmidt85

Still relevant

jvstme avatar May 14 '24 07:05 jvstme

This issue is stale because it has been open for 30 days with no activity.

peterschmidt85 avatar Jun 14 '24 01:06 peterschmidt85

This issue was closed because it has been inactive for 14 days since being marked as stale. Please reopen the issue if it is still relevant.

peterschmidt85 avatar Jun 28 '24 01:06 peterschmidt85