nano-node
nano-node copied to clipboard
Secure communications
Current issues
Nano uses a clear-text socket to communicate between nodes and nodes sign individual messages when communicating for authentication. Since the socket is plain-text, it is possible for someone to intercept a communication channel and replace messages. While vote messages can’t be forged, it is possible to filter and inject messages in a way that hinders correct operation of the node (man in the middle attack). In addition, all messages need to be designed in a way that assumes clear-text communication.
Proposed solution
To address these issues we want to replace the clear-text connections with TLS encrypted/authenticated connections. The goal is for connections to mutually authenticate and where applicable correctly identify a representative on the other side of a connection.
We would like to:
- stop the possibility of node ID spoofing
- stop the possibility of telemetry ID spoofing
- stop the possibility of man in the middle attack
- identify representatives securely and evaluate their online status
- ability to have helpful services like vote storage nodes without complications
Nano-specific signature validation (SHA2 vs Blake2b)
Nano uses Ed25519 combined with Blake2 as the digest function for public key derivation and digital signing. The way the Ed25519 algorithm is supported in X509s is quite incompatible with anything else but SHA256. This forces us to do a bit of manual work when generating certificates, signing them and when validating them. Blindly asking OpenSSL to validate certificates which we custom-signed will result in an invalid signature. There may be a possibility to produce the same public key for both SHA2 and Blake2, (that needs exploring) but it does not fundamentally change our approach even if it is possible.
Design choices
We have decided to design this in such a way that our Nano protocol sits on top of TLS and further on top of TCP. TLS was a natural choice thanks to its robustness, maturity and of course the "not a wise idea to implement custom cryptographic schemes to be tested out in production".
With a TLS-based design, we aim to ensure non-repudiation. Amongst other things, this means allowing peers to ensure that other peers that they're talking to are who they say they are. This is especially important with communication to and from principal representatives. Now how can we authenticate the PRs? By means of their rep key -- that was again a natural choice because the whole network can validate public rep keys in the ledger as truly belonging to PRs and not just made up.
TLS relies on X509 PKI in order to authenticate the participating peers. For our use case, we'll go with mutual authentication. That means both the connecting peer (client), as well as the listening peer (server) will have to authenticate against one another using X509 certificates.
Authentication only or encryption plus authentication? In theory, we should only need authentication but the addition of encryption might make it even harder for someone to attack the network.
X509 Certificates
A Principal Representative will generate a Root Certificate using its representative key to sign it. It will then generate an Intermediate Certificate using the Root Certificate to sign it (so essentially the same private key – its representative key). These two generations are the only moment when the private representative key is used.
The expiration times for the Root and Intermediate Certificates are planned to be very long, in the range of years or even tens of years. There should not be a need to regenerate them unless they are lost.
Finally, the Principal Representative will generate a Leaf Certificate (End-Entity Certificate) using the Intermediate Certificate to sign it. The expiration time for this one can be in the range of weeks or even a single day. The Leaf Certificate is what ultimately secures TLS connections in terms of encryption, but authentication is based on the upper layer certificates.
TLS Handshake
When initiating the handshake, both peers will send to each other the full chain of certificates that they authenticate themselves with. The validating party may then be able to start verification top down (leaf, intermediate, root). If the chain of certificates is deemed valid (see the custom verification section), the TLS session is established and the peers may be able to start communicating in the usual Nano protocol on top of the established session.
Custom certificate chain verification
A part of the TLS handshake consists of the peers verifying each other’s chain of certificates. Because we want to:
- use nano-specific signatures in the Root Certificate and Intermediate Certificate;
- validate that the Root Certificate is signed with a private key whose public counterpart can be checked against the Nano Ledger and found to belong to a Principal Representative.
We will need to do some custom parts in the standard way of certificate chain verification. For this purpose, we will use the OpenSSL chain validation hook that allows us to override the verification decision. We expect to see exactly three verification failures:
- Intermediate Certificate signature verification failure;
- Root Certificate signature verification failure;
- Self-signed Certificate in chain for the Root Certificate.
The first two we expect due to the nano-specific signature validation (see dedicated paragraph). The third is also to be expected because normally in a TLS handshake trust anchors should not be part of the chain, instead they should reside in the validating peer’s trust store. However, in our specific case, that wouldn’t work as the set of PRs is a dynamic one, therefore it’s a lot easier to have the full chain of certificates (including Roots) sent in the TLS handshake.
It’s worth mentioning that we will accept connections even if we don’t recognize the public key in a Root Certificate (e.g. unknown representative). However, other layers of the code will later be in charge of evaluating the public key associated with a connection and assigning a level of trust that we have in that particular peer based on whether his public key is known to us and other metrics.
Backwards compatibility
After the introduction of Secure Communications, special care must be taken in order not to prevent older nodes from connecting to newer nodes (or vice-versa). Our strategy for this is pretty rudimentary: trial and error. We distinguish two cases:
- a older node in the role of the client that connects to a newer server
- a newer node in the role of the client that connects to an older server
In the first case, the older client would be sending a clear-text nano-protocol message. The server would recognize that it is not a TLS ClientHello packet. Therefore it’d avoid handshaking with the client and they will fall back to clear-text communication, instead. Then the server would process the client’s nano-protocol message that it’d have already sent.
In the second case, the client can’t know beforehand if the server supports Secure Communications. The client will attempt a TLS handshake and if that fails, it falls back to clear-text. The fallback here means close the connection and reopen it, because the server has already sent us some data that we can no longer process – it’s been lost in the handshake process. Also, the client expects the reason for the handshake failure to be reasonable: the TLS client will be unable to recognize incoming data as being a TLS ServerHello packet.
Another important aspect is the falling back to clear-text – referred to as downgrading. We must find a way to ensure that downgrading cannot happen as an attack, but only as a mean for not excluding older nodes from the network. At a later software upgrade, Secure Communications can be enforced in all connections and eliminate the possibility of the downgrade attack.
Certificate revocation
We do not plan to have certificate revocation in the first implementation. We may have to revisit this.
Keeping up with representative changes
The list of representatives and their associated weights changes frequently. An account could be a representative account one minute and not the next.
Representative online status
Currently, a representative is deemed to be online, if we recently observed votes from that representative. This can be manipulated by an attacker and it is not a reliable indicator of online status.
The existence of secure connections to representatives can help us get a reliable answer to the question, is a representative online. We can say that a representative is online only if:
- we recently received traffic from a connection that is authorized by the representative
- if we have received a vote from that representative in the last few minutes and that vote is a recently generated vote (i.e. it has a timestamp that can be checked, it is non a final vote)
Item 2 makes sense to keep in case two representatives have trouble directly connecting to each other.