nitox icon indicating copy to clipboard operation
nitox copied to clipboard

TLS client verification support?

Open wafflespeanut opened this issue 5 years ago • 12 comments

It looks like Nitox doesn't support client verification as of now. I made some quick changes to pass TLS config to the native-tls connector. When I tested this, I'm getting:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: TlsError(Ssl(Error { code: ErrorCode(1), cause: Some(Ssl(ErrorStack([Error { code: 336031996, library: "SSL routines", function: "SSL23_GET_SERVER_HELLO", reason: "unknown protocol", file: "s23_clnt.c", line: 794 }]))) }, X509VerifyResult { code: 0, error: "ok" }))', src/libcore/result.rs:997:5

Apparently, the certificate is verified, but the connection is being cut off. I'm guessing I'm missing something in the NATS protocol? Either that, or something's wrong with how nitox does TLS. The other nats crate however supports this feature.

wafflespeanut avatar Mar 05 '19 12:03 wafflespeanut

Hey!

The other crate supports it simply for the reason that the TLS implementation is based on OpenSSL and it is kinda broken, especially if your local libssl is old (crypto/tls/ssl is hard so I can't blame Frank).

From what I can see you're trying to connect through SSLv2 which is a widely deprecated and unsupported protocol. (Even OpenSSL dropped support for it)

As such, native-tls does not support it as indicated by the Protocol enum in the docs: https://docs.rs/native-tls/0.2.2/native_tls/enum.Protocol.html

So, it's not really nitox-related, you should just upgrade your security protocols :)

OtaK avatar Mar 05 '19 14:03 OtaK

Also, quick note but when acting as a client, you don't need to provide a root cert, only the host name is needed. The current code works fine in production for months for us, given that your certificates business is setup correctly as well.

For reference, client-side implementation of native-tls is the first example: https://github.com/sfackler/rust-native-tls#usage

OtaK avatar Mar 05 '19 14:03 OtaK

The other crate supports it simply for the reason that the TLS implementation is based on OpenSSL and it is kinda broken, especially if your local libssl is old (crypto/tls/ssl is hard so I can't blame Frank).

From what I can see you're trying to connect through SSLv2 which is a widely deprecated and unsupported protocol. (Even OpenSSL dropped support for it)

NATS enforces TLS v1.2, so if I had an outdated protocol, NATS would've rejected the connection.

Also, quick note but when acting as a client, you don't need to provide a root cert, only the host name is needed. The current code works fine in production for months for us, given that your certificates business is setup correctly as well.

For reference, client-side implementation of native-tls is the first example: https://github.com/sfackler/rust-native-tls#usage

I was talking about mutual authentication (i.e., server-side verification of certs presented by the clients before accepting a connection). Since we're using NATS internally, we're using self-signed certs for both server and the clients (that's why we needed support for adding root CA certs).

I agree that the other crate isn't good (especially because it's sync) but it actually works. And, we're trying to move to nitox, but we're getting blocked by TLS support.

wafflespeanut avatar Mar 05 '19 18:03 wafflespeanut

It turned out that this has got to do with NATS protocol itself and how nitox currently deals with TLS.

When I tried debugging this with openssl, I got:

$ openssl s_client -connect ${NATS_IP}:4444 -CAfile ~/Desktop/nats-certs/ca-cert.pem -cert ~/Desktop/nats-certs/client-cert.pem -key ~/Desktop/nats-certs/client-key.pem -state -debug
CONNECTED(00000003)
SSL_connect:before/connect initialization
write to 0x1155460 [0x115a8a0] (326 bytes => 326 (0x146))
[[ REMOVED RAW BYTES ]]
SSL_connect:SSLv2/v3 write client hello A
read from 0x1155460 [0x115fe00] (7 bytes => 7 (0x7))
0000 - 49 4e 46 4f 20 7b 22                              INFO {"
SSL_connect:error in SSLv2/v3 read server hello A
140712761509528:error:140770FC:SSL routines:SSL23_GET_SERVER_HELLO:unknown protocol:s23_clnt.c:794:
---
no peer certificate available
---
No client certificate CA names sent
---
SSL handshake has read 7 bytes and written 326 bytes
---
New, (NONE), Cipher is (NONE)
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
No ALPN negotiated
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : 0000
    Session-ID: 
    Session-ID-ctx: 
    Master-Key: 
    Key-Arg   : None
    PSK identity: None
    PSK identity hint: None
    SRP username: None
    Start Time: 1551798460
    Timeout   : 300 (sec)
    Verify return code: 0 (ok)
---

So, NATS actually sends server INFO in plaintext before the TLS handshake (which is ridiculous!). After some digging, I found this issue (still active!). Apparently, all clients have been implemented knowing this fact. They parse the INFO value in plaintext and only then, they start the handshake. The other crate does the same thing.

So, I guess the current implementation doesn't actually work? Because, you're upgrading to TLS immediately following the TCP connection, rather than after parsing INFO?

wafflespeanut avatar Mar 05 '19 18:03 wafflespeanut

Oh, and one minor thing that struck me - Nitox is parsing the URL to get the host name when TLS is enabled. But, the previous step tries to parse the URL as SocketAddr (or resolve the DNS).

Both the steps are mutually exclusive. You can't parse / resolve an address with protocol, and you can't parse the address as Url without a protocol.

wafflespeanut avatar Mar 05 '19 19:03 wafflespeanut

Wow indeed, just looked at the mechanism that gnatsd is using and after looking around I didn't find any documentation related to this behavior (besides the issue you mentioned). (Ridiculous indeed!)

Alright, let's just consider then that NATS TLS is just unsupported for nitox as of now. :(

I can't determine the effort needed (besides what you already did about TLS configuration) to implement it, since INFO messages are asynchronous and can occur anytime. For instance, the cluster protocol informs clients of new addresses within a cluster through pushed INFO messages and thus I'm just not sure if this particular message will be sent with or without TLS.

And indeed we upgrade the TCP to TLS immediately even before trying to talk to the server. I have an idea to make it work quickly (thanks to the work done on the verbose mode Ack mechanism) but I'm really not sure.

About the SocketAddr / Url business, I'll check it out, but the convention in NATS is to give something like nats://myserver.my.tld:port, or simply xxx.xxx.xxx.xxx:port. It doesn't seem weird either. It's just that with TLS, you have to give a full URL with hostname or the client cert resolution will just fail since despite being able to resolve the IP from itself, it'll just fail at trying to extract the hostname from the given cluster URI.

OtaK avatar Mar 05 '19 19:03 OtaK

Hi, just to clarify that currently only the initial INFO message is plain text, once the connection has been upgraded to TLS the following INFO messages use the TLS transport.

wallyqs avatar Mar 05 '19 20:03 wallyqs

@wallyqs Alright, thanks for the clarification, duly noted! Also, same context, but what happens when the subsequent INFO messages have a tls_required: false. Should we downgrade to pure TCP? Or that just should not ever happen?

OtaK avatar Mar 05 '19 20:03 OtaK

yes that wouldn't happen, only when tls_required is announced then the server will expect the upgrade.

wallyqs avatar Mar 05 '19 20:03 wallyqs

About the SocketAddr / Url business, I'll check it out, but the convention in NATS is to give something like nats://myserver.my.tld:port, or simply xxx.xxx.xxx.xxx:port. It doesn't seem weird either. It's just that with TLS, you have to give a full URL with hostname or the client cert resolution will just fail since despite being able to resolve the IP from itself, it'll just fail at trying to extract the hostname from the given cluster URI.

It's not weird, but it doesn't work in the current setup. All I'm saying is that it needs to be improved as part of the TLS suppport, because currently it fails if we enable TLS and pass a cluster address like 1.2.3.4:4444 (addr resolves, but URL cannot be parsed) or nats://1.2.3.4:4444 (cannot parse SocketAddr or resolve the DNS)

wafflespeanut avatar Mar 05 '19 21:03 wafflespeanut

I agree that how it works currently is definitely not foolproof and relies on the user giving proper params for each tls/tcp mode. tls = url form / tcp = url/ip form

That will change in the PR draft I created :)

OtaK avatar Mar 06 '19 13:03 OtaK

I have opened a PR for this!

wafflespeanut avatar Mar 09 '19 16:03 wafflespeanut