nats-server icon indicating copy to clipboard operation
nats-server copied to clipboard

Feature: Support auto tls using Let's Encrypt

Open tamalsaha opened this issue 3 years ago • 9 comments

Feature Request

Support auto tls using Let's Encrypt

Use Case:

As a developer, I want to deploy a TLS encrypted nats-sever publicly. My nats publishers are on the edge and streams data to the nats server. Today this is possible, but the certificate management is done separately. This is an extra operational complexity.

Proposed Change:

Integrate cert issue using acme protocol from Let's Encrypt. The code is pretty simple: https://stackoverflow.com/a/40494806/244009

It could be an extra section in nats config:

tls: {
  acme: {
    hosts: ["example.com"]
    dir: /tmp/certs
    accetTOS: true
  }
}

Who Benefits From The Change(s)?

Any developer who wants to deploy nats with a public endpoint securely.

Alternative Approaches

Manage certificates externally. If you are on Kubernetes, you can use cert-manager.io . But I believe you still need to update the nats process to pick up the new certificate.

tamalsaha avatar Feb 07 '21 18:02 tamalsaha

I landed here looking for similar to this.

Can NATS be put behind a reverse proxy like Caddy or Nginx, with a HTTP01 challenge or similar?

alexellis avatar Mar 30 '22 18:03 alexellis

@tamalsaha I use LE for the demo network as is, so maybe we can coordinate. I just run the LE process to update and signal the server to reload and all is good.

derekcollison avatar Mar 30 '22 19:03 derekcollison

@alexellis You can put us behind something that does real layer 4, like HAProxy. I have done that myself. Anything above you maybe can but TLS from the normal NATS client protocol will probably not work, meaning you would only accept websocket connections.

May I ask why you would want to do that? For instance I would setup a layer of Leafnodes that connect back to a core NATS cluster as a type of DMZ.

derekcollison avatar Mar 30 '22 19:03 derekcollison

I want to run NATS with a valid TLS certificate for using JetStream, because it will be exposed on the Internet. Websockets would be fine.

@derekcollison do you mean that you use certbot?

alexellis avatar Mar 30 '22 19:03 alexellis

You can do that today without anything in front. The demo system (demo.nats.io) has LE TLS, but allows non-tls optionally on same port, supports NATS client, websockets and MQTT, and has JetStream enabled. No reason to put anything in front per say IMO, sans DDOS vectors but I sue ufw for that..

derekcollison avatar Mar 30 '22 20:03 derekcollison

Yes I use certbot to update and then simply reload the server, does not go done, use systemd.

derekcollison avatar Mar 30 '22 20:03 derekcollison

Here is the complete demo server config. I run certbot and send a signal to the server to reload.

port: 4222
https: 8222
server_name: us-central-nats-demo

max_connections: 250000
max_subscriptions: 200000

reconnect_error_reports: 3600
max_traced_msg_len: 64


logfile_size_limit: 2GB
log_file: "/var/log/nats-server.log"

jetstream {
  store_dir: /var/jetstream
  max_mem_store:  10GiB
  max_file_store: 410GiB
}

tls {
  cert_file: "/etc/letsencrypt/live/demo.nats.io/fullchain.pem"
  key_file:  "/etc/letsencrypt/live/demo.nats.io/privkey.pem"
  timeout:    "5s"
}

# Allow both TLS and non-TLS to work on same port.
allow_non_tls: true

leafnodes {
  port: 7422
  tls {
    cert_file: "/etc/letsencrypt/live/demo.nats.io/fullchain.pem"
    key_file:  "/etc/letsencrypt/live/demo.nats.io/privkey.pem"
    timeout:    "5s"
  }
}

no_auth_user: demo-user

demo_perms = {
  publish = {
    # Do not allow deletion of MQTT streams
    deny = ["$JS.API.STREAM.DELETE.$MQTT_msgs", "$JS.API.STREAM.DELETE.$MQTT_rmsgs", "$JS.API.STREAM.DELETE.$MQTT_sess"]
  }
}

accounts {
  default: {
    jetstream: {
      max_mem:		8GiB
      max_store:	400GiB
      max_streams:	1024
      max_consumers:	8192
    }
    users = [ { user: demo-user, permissions: $demo_perms} ]
  }
  $SYS: {
    users = [ { nkey: UDEMO3ZANTMUGPSBS3H54WKJN3TNVGQBJUQFCT7H4MUQLCRRQ26CWIIP } ]
  }
}


websocket {
  port: 8443
  compression: true
  handshake_timeout: "5s"
  tls {
    cert_file: "/etc/letsencrypt/live/demo.nats.io/fullchain.pem"
    key_file:  "/etc/letsencrypt/live/demo.nats.io/privkey.pem"
    timeout:    "5s"
  }
  no_auth_user: demo-user
}

mqtt {
  port: 1883
  tls {
    cert_file: "/etc/letsencrypt/live/demo.nats.io/fullchain.pem"
    key_file:  "/etc/letsencrypt/live/demo.nats.io/privkey.pem"
    timeout:    "5s"
  }
  no_auth_user: demo-user
  ack_wait: "1m"
  max_ack_pending: 1024
}

derekcollison avatar Apr 05 '22 22:04 derekcollison

Would be nice to have it built-in. I understand there are other ways to do it but operationally, there's still friction we can reduce with this being autotls.

simar7 avatar May 13 '22 21:05 simar7

How do you see autotls helping here? I am not too familiar so apologies if dumb question.

derekcollison avatar May 13 '22 21:05 derekcollison

I'm also interested in TLS certificates generated automatically using Let's Encrypt.

When using short-lived certificates, it's almost mandatory to use a timer or a cronjob to renew certificates. I understand that this is not a limitation specific to nats-server, in fact, most softwares do not handle certificate generation using Let's Encrypt.

However, NATS does not support being served behind a reverse-proxy with TLS termination (to be fair, other widely used softwares do the same, PostgreSQL for example).

As such, it's not possible to deploy NATS behind proxies such as Traefik, which automate certificate renewal...

We deploy NATS servers using different technologies:

  • Systemd
  • Docker Swarm
  • Azure Container Instances

In our projects, we use Traefik to automate almost all certificates generation and renewals (regardless of deployment environment), but because of NATS, we still need certbot or lego to run somewhere. When we have access to the host machine, this is fine, but when running in Azure Container Instances (or other container platforms), it means that we need to deploy another container (which means additional cost and complexity).

I would really love to be able to configure an ACME account + a challenge (we mostly use dns-01) in nats-server config and let NATS request/save/renew TLS certificates according to config (if certificate fails to be issued, server does not start).

Is ACME certificate generation something that you would be willing to consider adding to nats-server ? And if not, do you have recommendations to provide such feature outside of nats-server ?

I believe this feature would make nats-server even more attractive and simple to deploy (especially for small teams/projects).

Anyway, thanks for bringing awesome technologies such as NATS 🎉

charbonnierg avatar Feb 13 '23 18:02 charbonnierg

I played a bit with this yesterday, and managed to start a NATS server with TLS enabled for:

  • Standard Listener
  • Monitoring Listener
  • Websocket Listener
  • MQTT Listener
  • Leafnode Listener

and mTLS for:

  • Cluster Routes
  • Gateway Routes

by using certmagic and overriding the various TLSConfig structs in nats server with configs obtained using certmagic.Magic.TLSConfig().

I made an example repo, where I do something like:

EDIT: It's even possible to configure cluster mTLS, and gateway mTLS, as long as server is configured correctly (e.g, advertise is set for cluster and gateway block). Either set exhaustive routes (like cluster in this example), or set authorization with a user matching wildcard certificates to allow dynamic routes (like gateways)

func (o *NatsMagic) UpdateNatsServerOptions(magic *certmagic.Config, serverOptions *server.Options) error {
	tlsConfig := magic.TLSConfig()
	tlsConfig.NextProtos = append([]string{"h2", "http/1.1"}, tlsConfig.NextProtos...)
        // Global TLS config
	serverOptions.TLSConfig = tlsConfig.Clone()
	serverOptions.TLS = true
	serverOptions.TLSVerify = false
        // Leafnode TLS Config
	serverOptions.LeafNode.TLSConfig = tlsConfig.Clone()
        //Websocket TLS Config
	serverOptions.Websocket.TLSConfig = tlsConfig.Clone()
	serverOptions.Websocket.NoTLS = false
        // MQTT TLS Config
	serverOptions.MQTT.TLSConfig = tlsConfig.Clone()
	// Cluster TLS config
	// In order for a server to connect to a cluster, it must have a valid TLS cert
	// with a SAN that matches the route entry in the cluster configuration.
	certs, err := magic.ClientCredentials(context.TODO(), o.DefaultDomains)
	if err != nil {
		return err
	}
	// Cluster TLS configs
	serverOptions.Cluster.TLSConfig = tlsConfig.Clone()
	serverOptions.Cluster.TLSConfig.GetCertificate = nil
	serverOptions.Cluster.TLSConfig.ClientAuth = tls.RequireAndVerifyClientCert
	serverOptions.Cluster.TLSCheckKnownURLs = true
	serverOptions.Cluster.TLSMap = true
	serverOptions.Cluster.TLSConfig.Certificates = certs
	// Gateway TLS config
	serverOptions.Gateway.TLSConfig = tlsConfig.Clone()
	serverOptions.Gateway.TLSConfig.GetCertificate = nil
	serverOptions.Gateway.TLSConfig.ClientAuth = tls.RequireAndVerifyClientCert
	serverOptions.Gateway.TLSCheckKnownURLs = true
	serverOptions.Gateway.TLSMap = true
	serverOptions.Gateway.TLSConfig.Certificates = certs

	return nil
}

I did some tests with 2 clusters (3 servers each) connected by gateways, and 1 leafnode, everything worked fine

It seems still hacky, but less than before. I tried to fork nats-server at first, but I faced several difficulties:

  • I did not have a clear picture of how to change server options (as well as NATS configuration file parser)
  • I did not know what to do with certmagic logs, which are emitted using zap (that's why in the example repo, all logs are handled by zap, including server logs)
  • Certmagic is built around libdns, and by default does not include the DNS client library used to update DNS records. It's up to each project to require one or more DNS providers (such as https://github.com/libdns/route53). Caddy strategy (main user of certmagic) is for users to build a custom caddy binary with the DNS providers of their choice. I don't think NATS should include all DNS providers, so maybe a custom build solution would be appropriate. At the moment I added support for 3 DNS providers only, and users cannot add or remove providers.

Here is one of the configuration file that I used to try (successfully):

{
    "server_name": "nats-01",
    "port": 4222,
    "host": "127.0.0.1",
    "client_advertise": "nats-01.local.quara.me:4222",
    "cluster": {
        "name": "cluster-01",
        "port": 4244,
        "advertise": "nats-01.local.quara.me:4244",
        "routes": [
            "tls://nats-01.local.quara.me:4244",
            "tls://nats-02.local.quara.me:4245",
            "tls://nats-03.local.quara.me:4246"
        ]
    },
    "gateway": {
        "name": "cluster-01",
        "port": 7422,
        "advertise": "nats-01.local.quara.me:7422",
        "authorization": {
            "user": "*.local.quara.me"
        }
    },
    "jetstream": {
        "max_memory_store": 1073741824,
        "max_file_store": 1073741824
    },
    "websocket": {
        "port": 10443,
        "advertise": "nats.01.local.quara.me:10443"
    },
    "mqtt": {
        "port": 8883
    },
    "leafnodes": {
        "port": 7222,
        "advertise": "nats-01.local.quara.me:7222"
    },
    "operator": "eyJhbGciOiJlZDI1NTE5LW5rZXkiLCJ0eXAiOiJKV1QifQ.eyJuYW1lIjoidGVzdC1vcGVyYXRvciIsInN1YiI6Ik9ETzNJVkQzWkdEWEFFSERET1RET0tVVk40WkM1UVlFU1RONUkyUEFMVk9JTzNDTURFUVdQQVhTIiwiaXNzIjoiT0RPM0lWRDNaR0RYQUVIRERPVERPS1VWTjRaQzVRWUVTVE41STJQQUxWT0lPM0NNREVRV1BBWFMiLCJqdGkiOiJHWE5NU1RYWVlXWTRaVzZCNFYyTUNaMklFSk9KVUhJRTVVUUdaVTIzSTVCTUtGSUFWUUZBIiwiaWF0IjoxNjkzMjYwNDIyLCJuYXRzIjp7InR5cGUiOiJvcGVyYXRvciIsInZlcnNpb24iOjJ9fQ.aadO53UH-iWOBYlC0FdD-8OuO7fG-srsf5Re-_dwqx9BaM3Ps-Y2st_RzBWnMpXgvq-e4GXzRRx1M23pr9HtCA",
    "system_account": "ADNU2QRXBD4ZKPJBX2W4GPYIZPJNU25IVC67TPARE22755KLN4JSJRQH",
    "resolver": {
        "type": "full",
        "dir": "./nats-01/jwt",
        "allow_delete": true,
        "interval": "2m"
    },
    "resolver_preload": {
        "ADNU2QRXBD4ZKPJBX2W4GPYIZPJNU25IVC67TPARE22755KLN4JSJRQH": "eyJhbGciOiJlZDI1NTE5LW5rZXkiLCJ0eXAiOiJKV1QifQ.eyJuYW1lIjoiU1lTIiwic3ViIjoiQUROVTJRUlhCRDRaS1BKQlgyVzRHUFlJWlBKTlUyNUlWQzY3VFBBUkUyMjc1NUtMTjRKU0pSUUgiLCJpc3MiOiJPRE8zSVZEM1pHRFhBRUhERE9URE9LVVZONFpDNVFZRVNUTjVJMlBBTFZPSU8zQ01ERVFXUEFYUyIsImp0aSI6IkdYTk1TVFc1WU5OUEhSRlRXQkM1RU9SQUNNUklQQlk1R0I0VlNUNVFBVTdETDQyVTJVRlEiLCJpYXQiOjE2OTMyNjA0MjIsIm5hdHMiOnsidHlwZSI6ImFjY291bnQiLCJ2ZXJzaW9uIjoyLCJzaWduaW5nX2tleXMiOlt7ImtleSI6IkFDRk9XVTZXQU1UV0NHNzJXWTVYN0pWRFU3WExKU0cyWENQTUZISFQ2M1BIR1RWRzVKVDI1VlVDIiwicm9sZSI6Im1vbml0b3IiLCJ0ZW1wbGF0ZSI6eyJwdWIiOnsiYWxsb3ciOlsiJFNZUy5SRVEuQUNDT1VOVC4qLioiLCIkU1lTLlJFUS5TRVJWRVIuKi4qIl19LCJzdWIiOnsiYWxsb3ciOlsiJFNZUy5BQ0NPVU5ULiouPiIsIiRTWVMuU0VSVkVSLiouPiJdfSwic3VicyI6MTAwLCJwYXlsb2FkIjoxMDQ4NTc2LCJhbGxvd2VkX2Nvbm5lY3Rpb25fdHlwZXMiOlsiU1RBTkRBUkQiLCJXRUJTT0NLRVQiXX0sImtpbmQiOiJ1c2VyX3Njb3BlIn0seyJrZXkiOiJBQlVIM0gzQ0VTRkFXU1NVNE1PTlhOVllKNVpLQ0ZWWlNYM0RHNFNDUEY3RUhFTVBETVE1TVYyNiIsInJvbGUiOiJpc3N1ZXIiLCJ0ZW1wbGF0ZSI6eyJwdWIiOnsiYWxsb3ciOlsiJFNZUy5SRVEuQ0xBSU1TLiouIiwiJFNZUy5SRVEuQUNDT1VOVC4qLkNMQUlNUy4qIl19LCJzdWJzIjoxMCwicGF5bG9hZCI6MTA0ODU3NiwiYWxsb3dlZF9jb25uZWN0aW9uX3R5cGVzIjpbIlNUQU5EQVJEIiwiV0VCU09DS0VUIl19LCJraW5kIjoidXNlcl9zY29wZSJ9LHsia2V5IjoiQURER0FYSDNFRlpQM1NaSlVQQUJCR0tYT1lBT1NTWEpHNVc0VlNaNUxTUTQ2U0ZNRElDNUJJQzMiLCJyb2xlIjoiYWRtaW5pc3RyYXRvciIsInRlbXBsYXRlIjp7InB1YiI6eyJhbGxvdyI6WyI-Il19LCJzdWIiOnsiYWxsb3ciOlsiPiJdfSwic3VicyI6MTAsInBheWxvYWQiOjEwNDg1NzYsImFsbG93ZWRfY29ubmVjdGlvbl90eXBlcyI6WyJTVEFOREFSRCIsIldFQlNPQ0tFVCJdfSwia2luZCI6InVzZXJfc2NvcGUifSx7ImtleSI6IkFCUUM0QkNCR1lKRVlHVzVETFJGVTNZVVI3WVpDU09YNk9FVkcyNlNBTkZQQVZVWUhBVDRUSEdUIiwicm9sZSI6ImxlYWZub2RlIiwidGVtcGxhdGUiOnsicHViIjp7ImFsbG93IjpbIj4iXX0sInN1YiI6eyJhbGxvdyI6WyI-Il19LCJzdWJzIjotMSwiZGF0YSI6LTEsInBheWxvYWQiOjEwNDg1NzYsImFsbG93ZWRfY29ubmVjdGlvbl90eXBlcyI6WyJMRUFGTk9ERSIsIkxFQUZOT0RFX1dTIl19LCJraW5kIjoidXNlcl9zY29wZSJ9XSwiZXhwb3J0cyI6W3siZGVzY3JpcHRpb24iOiJBY2NvdW50IHNwZWNpZmljIG1vbml0b3Jpbmcgc3RyZWFtIiwiaW5mb191cmwiOiJodHRwczovL2RvY3MubmF0cy5pby9uYXRzLXNlcnZlci9jb25maWd1cmF0aW9uL3N5c19hY2NvdW50cyIsIm5hbWUiOiJhY2NvdW50LW1vbml0b3Jpbmctc3RyZWFtcyIsInN1YmplY3QiOiIkU1lTLkFDQ09VTlQuKi4-IiwidHlwZSI6InN0cmVhbSIsImFjY291bnRfdG9rZW5fcG9zaXRpb24iOjN9LHsiZGVzY3JpcHRpb24iOiJSZXF1ZXN0IGFjY291bnQgc3BlY2lmaWMgbW9uaXRvcmluZyBzZXJ2aWNlcyBmb3I6IFNVQlNaLCBDT05OWiwgTEVBRlosIEpTWiBhbmQgSU5GTyIsImluZm9fdXJsIjoiaHR0cHM6Ly9kb2NzLm5hdHMuaW8vbmF0cy1zZXJ2ZXIvY29uZmlndXJhdGlvbi9zeXNfYWNjb3VudHMiLCJuYW1lIjoiYWNjb3VudC1tb25pdG9yaW5nLXNlcnZpY2VzIiwic3ViamVjdCI6IiRTWVMuUkVRLkFDQ09VTlQuKi4qIiwidHlwZSI6InNlcnZpY2UiLCJyZXNwb25zZV90eXBlIjoiU3RyZWFtIiwiYWNjb3VudF90b2tlbl9wb3NpdGlvbiI6NH0seyJkZXNjcmlwdGlvbiI6IlJlcXVlc3QgYWNjb3VudCBKV1QiLCJpbmZvX3VybCI6Imh0dHBzOi8vZG9jcy5uYXRzLmlvL25hdHMtc2VydmVyL2NvbmZpZ3VyYXRpb24vc3lzX2FjY291bnRzIiwibmFtZSI6ImFjY291bnQtbG9va3VwLXNlcnZpY2UiLCJzdWJqZWN0IjoiJFNZUy5SRVEuQUNDT1VOVC4qLkNMQUlNUy5MT09LVVAiLCJ0eXBlIjoic2VydmljZSIsInJlc3BvbnNlX3R5cGUiOiJTdHJlYW0iLCJhY2NvdW50X3Rva2VuX3Bvc2l0aW9uIjo0fSx7ImRlc2NyaXB0aW9uIjoiUmVxdWVzdCBhbGwgc2VydmVycyBoZWFsdGgiLCJpbmZvX3VybCI6Imh0dHBzOi8vZG9jcy5uYXRzLmlvL25hdHMtc2VydmVyL2NvbmZpZ3VyYXRpb24vc3lzX2FjY291bnRzIiwibmFtZSI6InNlcnZlci1oZWFsdGgtc2VydmljZSIsInN1YmplY3QiOiIkU1lTLlJFUS5TRVJWRVIuKi5IRUFMVEhaIiwidHlwZSI6InNlcnZpY2UiLCJyZXNwb25zZV90eXBlIjoiU3RyZWFtIn1dLCJsaW1pdHMiOnsiaW1wb3J0cyI6MTAsImV4cG9ydHMiOjUsIndpbGRjYXJkcyI6dHJ1ZSwiY29ubiI6MTAsImxlYWYiOjEwLCJzdWJzIjoxMDAwLCJwYXlsb2FkIjoyMDk3MTUyfX19.lKZj99iabc2ae4mDKh-l5B-nNX00Vqv2QjwQbR8epLUAhkeFE07IrbVFXxOlnviN2iul6o2o26nK3nX5ss2DBA"
    }
}

charbonnierg avatar Aug 29 '23 07:08 charbonnierg