traefik-on-service-fabric icon indicating copy to clipboard operation
traefik-on-service-fabric copied to clipboard

Investigate ACME integration for SF provider

Open jjcollinge opened this issue 6 years ago • 11 comments

Investigate the Traefik ACME integration models and which works best for our provider.

Steps to use ACME with Traefik on SF:

  • Create an Azure DNS Zone

  • Point domain registrar's Nameservers to Azure DNS Zones Nameservers

  • Create wildcard A/AAAAA record pointing to ALBs PIP

  • Create Service Principal for RBAC az ad sp create-for-rbac -n "traefik" --scopes /subscriptions/{SUB_ID}/resourceGroups/{RES_GRP}/providers/Microsoft.Network/dnszones/{DNS_ZONE}

  • Add AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_SUBSCRIPTION_ID, AZURE_TENANT_ID, AZURE_RESOURCE_GROUP as environment variables in Traefik ServiceManifest.xml

  • Add TLS entrypoint and any optionally a redirect rule (80 -> 443)

      [entryPoints.http]
             address = ":80"
         [entryPoints.http.redirect]
             entryPoint = "https"
    
  • Enable and populate [acme] configuration

  • Add labels to a web service's ServiceManifest.xml file:

          <Label Key="traefik.frontend.rule">Host:test.yourdomain.com</Label>
          <Label Key="traefik.passHostHeader">true</Label>
          <Label Key="traefik.expose">true</Label>
    
  • Hit http[s]://test.yourdomain.com

Work to do:

  • Store the certificates "acme.json" in a replicated fashion to avoid having to refresh tokens after node failure and to help reduce requests to Let'sEncrypt as it has rate limiting. This requires creating a key-value store provider http://v1-5.archive.docs.traefik.io/user-guide/kv-config/
  • Could mount a volume that Traefik writes the config/acme.json files too which is shared amongst each instance? We'd need a way of electing a master with write status.
  • Automate DNS configuration

jjcollinge avatar Dec 18 '17 11:12 jjcollinge

@jjcollinge Is it correctly understood that this will work in the current release assuming that the DNS is setup to point to the load balancer "manually" and the number of nodes is small enough that the lets encrypt rate limit won't be hit?

Btw, the doc link ( http://v1-5.archive.docs.traefik.io/user-guide/kv-config/ ) is dead.

Wrt to a key-value store, then this might be a stupid question, can't you use a Stateful SF service?

petertiedemann avatar Mar 24 '18 16:03 petertiedemann

Hi @petertiedemann - yes you are correct - LE ratelimiting is quite aggressive for production services so it's hard not to hit the limit: https://letsencrypt.org/docs/rate-limits/ - A workaround is to pre-provision the certificate per cluster deployment, add the cert to the code package and update the Traefik.toml to point use it.

Yes, we could use a SF stateful service but we don't really wan't to maintain an additional C# service. We're going to try and fix this using the native SF replicated volume driver once that becomes available.

jjcollinge avatar Mar 26 '18 06:03 jjcollinge

If it can be done with no additional services that is of course to be preferred, but if one is required, then what is wrong with one in C#? :)

petertiedemann avatar Mar 26 '18 15:03 petertiedemann

We having been holding out for the SF volume driver and we are closing in on a release so it felt worthwhile to hold off. If this is a blocker for you, let us know and we can re-evaluate or feel free to create a PR.

jjcollinge avatar Mar 27 '18 21:03 jjcollinge

If there is a nice solution with the volume driver then I don't see any reason to try a more complex solution. Unless of course the complexity then becomes related to electing a master etc.

petertiedemann avatar Mar 28 '18 11:03 petertiedemann

@jjcollinge After messing around a bit with ACME, we found that in fact there are more critical problems than rate limiting. We cannot use the HTTP challenge as the challenge request is likely to be routed to another node than the one sending the request due to the load balancer. We can also not use the DNS challenge when multiple nodes try to get a certificate for the same domain at the same time (because they will all try to write to the same TXT record with different payloads).

For now we have decided to have an external script obtain a new certificate from time to time, and then simply update the traefik application package with this new certificate and push that update to the SF cluster. However, i am very interested in hearing if there has been any progress on getting a "proper" solution for this?

petertiedemann avatar Jul 26 '18 16:07 petertiedemann

Is there any change to the progress for this? I have a situation that requires this to be able to Traefik as our rev proxy. We are keen to be able to use Lets Encrypt which we already do use in a complex way outside of service fabric using ARR and some automations.

Our solution is multitenanted where the tenant can use their own domain, so need to be able have certificates generate on the fly but with many instances of traefik we'd likely hit the rate limits.

Alternatlively, Is there an easy way to change the TOML file without a redeploy of traefik?

radderz avatar Nov 10 '18 01:11 radderz

Hi, Unfortunately there isn't any ongoing work on this one but we'd love a contribution if it was something you wanted to pick up and look at?

For updating the TOML while traefik is running there is a --file.watch field which looks for config provide by TOML files - I don't know if this can configure TLS settings.

    --file                                        Enable File backend with default settings                                        (default "false")
    --file.constraints                            Filter services by constraint, matching with Traefik tags.                       (default "[]")
    --file.debugloggeneratedtemplate              Enable debug logging of generated configuration template.                        (default "false")
    --file.directory                              Load configuration from one or more .toml files in a directory                   
    --file.filename                               Override default configuration template. For advanced users :)                   
    --file.templateversion                        Template version.                                                                (default "0")
    --file.trace                                  Display additional provider logs (if available).                                 (default "false")
    --file.watch                                  Watch provider                                                                   (default "true")

You could investigate whether the Azure Files Volume Driver would allow you to mount an Azure Files share and update the TOML dynamically. However, it's still in preview and I think would require Traefik to be run inside an container which hasn't been tested much.

lawrencegripper avatar Nov 12 '18 09:11 lawrencegripper

Hi, Unfortunately there isn't any ongoing work on this one but we'd love a contribution if it was something you wanted to pick up and look at?

For updating the TOML while traefik is running there is a --file.watch field which looks for config provide by TOML files - I don't know if this can configure TLS settings.

    --file                                        Enable File backend with default settings                                        (default "false")
    --file.constraints                            Filter services by constraint, matching with Traefik tags.                       (default "[]")
    --file.debugloggeneratedtemplate              Enable debug logging of generated configuration template.                        (default "false")
    --file.directory                              Load configuration from one or more .toml files in a directory                   
    --file.filename                               Override default configuration template. For advanced users :)                   
    --file.templateversion                        Template version.                                                                (default "0")
    --file.trace                                  Display additional provider logs (if available).                                 (default "false")
    --file.watch                                  Watch provider                                                                   (default "true")

You could investigate whether the Azure Files Volume Driver would allow you to mount an Azure Files share and update the TOML dynamically. However, it's still in preview and I think would require Traefik to be run inside an container which hasn't been tested much.

With using the Service Fabric Provider, would adding the file backend work with TLS certificates being added dynamically? I can see that adding that to the TOML shows a new provider but I am unsure if that provider would enable TLS certificate updates?

radderz avatar Jan 11 '19 05:01 radderz

Afraid I can't give any guidance on that as I'm not sure - best way would be to run some tests.

lawrencegripper avatar Jan 11 '19 11:01 lawrencegripper

I ended up doing a parent process that runs traefik as a sub process. It does the certificates and when there is a change it updates the TOML file and restarts traefik, not ideal but gets the job done.

Was quite a pain to make it support Linux though as the binary file of the traefik binary needs to have the execute permission added since its not the main binary of the service anymore.

radderz avatar Jan 22 '19 22:01 radderz