nri icon indicating copy to clipboard operation
nri copied to clipboard

[RFC] Plugins authentication proposal

Open kad opened this issue 9 months ago • 12 comments

In the light of the recent discussions about what plugins are allowed to do or not, I think we need to implement mechanism for plugin authentication to the runtime.

This would allow Runtime to:

  • define on per-plugin basis what kind of operations is allowed to be performed to plugin (e.g. disable modifications of hooks, or disable modification of anything else)
  • have default security for unauthenticated plugins
  • uniquely identify plugin by plugin's public key (similar to Wiregiard peer identification)
  • make sure that one plugin does not try to fake other plugin's name during registration

For authentication, I propose to use challenge/response pattern during plugin registration/configuration phases.

Challenge/response data can be encrypted via Curve25519 / ChaCha20-Poly1305 that are available via standard Go libraries, and are quite common asymmetric encryption algorithms, again similar to what is used in Wireguard.

For protocol, we can easily extend current messages, as demonstrated in following sequence diagram:

Image

For protocol changes we can do something similar to this:

diff --git a/pkg/api/api.proto b/pkg/api/api.proto
index 78fa64d..25bb500 100644
--- a/pkg/api/api.proto
+++ b/pkg/api/api.proto
@@ -26,7 +26,7 @@ option go_package = "github.com/containerd/nri/pkg/api;api";
 // The rest of the API is defined by the Plugin service.
 service Runtime {
     // RegisterPlugin registers the plugin with the runtime.
-    rpc RegisterPlugin(RegisterPluginRequest) returns (Empty);
+    rpc RegisterPlugin(RegisterPluginRequest) returns (RegisterPluginResponse);
     // UpdateContainers requests unsolicited updates to a set of containers.
     rpc UpdateContainers(UpdateContainersRequest) returns (UpdateContainersResponse);
 }
@@ -36,6 +36,13 @@ message RegisterPluginRequest {
     string plugin_name = 1;
     // Plugin invocation index. Plugins are called in ascending index order.
     string plugin_idx = 2;
+    // Plugin authentication public key
+    string plugin_public_key = 3;
+}
+
+message RegisterPluginResponse {
+    // Enum, descibes sets of capabilities that plugin allowed to use.
+    repeated int32 allowed_capabilities = 1;
 }
 
 message UpdateContainersRequest {
@@ -141,12 +148,18 @@ message ConfigureRequest {
   int64 registration_timeout = 4;
   // Configured request processing timeout in milliseconds.
   int64 request_timeout = 5;
+  // Runtime's public key for encrypting authentication challenge response
+  string runtime_public_key = 6;
+  // Authentication challenge
+  string runtime_authentication_challenge = 7;
 }
 
 message ConfigureResponse {
   // Events to subscribe the plugin for. Each bit set corresponds to an
   // enumerated Event.
   int32 events = 2;
+  // Authentication response encrypted with runtime public key
+  string runtime_authentication_response = 3;
 }
 
 message SynchronizeRequest {

Encrypt/decrypt/key generation functions can be implemented like this: WARNING: code is not optimal, just for demonstration. Key generation can be done via stdlib: https://pkg.go.dev/crypto/ed25519#GenerateKey and there are other chunks that can be improved.

package main

import (
	"bytes"
	"crypto/rand"
	"encoding/base64"
	"encoding/hex"
	"fmt"

	"golang.org/x/crypto/chacha20poly1305"
	"golang.org/x/crypto/curve25519"
)

// GenerateKeyPair generates a public/private key pair for Curve25519
func GenerateKeyPair() ([32]byte, [32]byte) {
	var privateKey [32]byte
	_, err := rand.Read(privateKey[:])
	if err != nil {
		panic(err)
	}

	// Clamp the private key
	privateKey[0] &= 0xF8  // Clear the three most significant bits
	privateKey[31] &= 0x7F // Clear the most significant bit
	privateKey[31] |= 0x40 // Set the second least significant bit

	var publicKey [32]byte
	curve25519.ScalarBaseMult(&publicKey, &privateKey)

	return privateKey, publicKey
}

// Encrypt encrypts the plaintext using ChaCha20-Poly1305
func Encrypt(plaintext []byte, key []byte) ([]byte, error) {
	aead, err := chacha20poly1305.New(key[:])
	if err != nil {
		return nil, err
	}

	// Generate a nonce
	nonce := make([]byte, aead.NonceSize())
	if _, err := rand.Read(nonce); err != nil {
		return nil, err
	}

	// Encrypt the plaintext
	ciphertext := aead.Seal(nonce, nonce, plaintext, nil)

	return ciphertext, nil
}

// Decrypt decrypts the ciphertext using ChaCha20-Poly1305
func Decrypt(ciphertext []byte, key []byte) ([]byte, error) {
	aead, err := chacha20poly1305.New(key[:])
	if err != nil {
		return nil, err
	}

	// Extract the nonce from the ciphertext
	nonceSize := aead.NonceSize()
	if len(ciphertext) < nonceSize {
		return nil, fmt.Errorf("ciphertext too short")
	}
	nonce, ciphertext := ciphertext[:nonceSize], ciphertext[nonceSize:]

	// Decrypt the ciphertext
	plaintext, err := aead.Open(nil, nonce, ciphertext, nil)
	if err != nil {
		return nil, err
	}

	return plaintext, nil
}

func main() {
	// Generate key pairs for Plugin and Runtime
	PluginPrivate, PluginPublic := GenerateKeyPair()
	RuntimePrivate, RuntimePublic := GenerateKeyPair()

	// Print keys in Base64 format
	fmt.Printf("Plugin's Private Key (Base64): %s\n", base64.StdEncoding.EncodeToString(PluginPrivate[:]))
	fmt.Printf("Plugin's Public Key (Base64): %s\n", base64.StdEncoding.EncodeToString(PluginPublic[:]))
	fmt.Printf("Runtime's Private Key (Base64): %s\n", base64.StdEncoding.EncodeToString(RuntimePrivate[:]))
	fmt.Printf("Runtime's Public Key (Base64): %s\n", base64.StdEncoding.EncodeToString(RuntimePublic[:]))

	// Plugin and Runtime exchange public keys and derive a shared secret
	sharedSecretPlugin, err := curve25519.X25519(PluginPrivate[:], RuntimePublic[:])
	if err != nil {
		panic(err)
	}
	sharedSecretRuntime, err := curve25519.X25519(RuntimePrivate[:], PluginPublic[:])
	if err != nil {
		panic(err)
	}

	// Ensure shared secrets match
	if !bytes.Equal(sharedSecretPlugin, sharedSecretRuntime) {
		panic("shared secrets do not match")
	}

	// Prepare a message to encrypt
	message := []byte("Random challenge / response data")
	fmt.Printf("Original message: %s\n", message)

	// Encrypt the message
	ciphertext, err := Encrypt(message, sharedSecretPlugin[:])
	if err != nil {
		panic(err)
	}
	fmt.Printf("Ciphertext (hex): %s\n", hex.EncodeToString(ciphertext))

	// Decrypt the message
	decryptedMessage, err := Decrypt(ciphertext, sharedSecretPlugin[:])
	if err != nil {
		panic(err)
	}
	fmt.Printf("Decrypted message: %s\n", decryptedMessage)
}


kad avatar Mar 27 '25 22:03 kad

I see the plugins use ttrpc as the transport, that is an implementation of grpc, is not possible to reuse the grpc existing mechansims than create a new one https://grpc.io/docs/guides/auth/?

at a minimum, it will be easier to support long term, and maintaining a subsystem of authentication use to be a lot of works, specially to deal with security problems , since this custom subsystem become a magnet for big bounties and hackers to exploit

aojea avatar Mar 28 '25 07:03 aojea

I see the plugins use ttrpc as the transport, that is an implementation of grpc, is not possible to reuse the grpc existing mechansims than create a new one https://grpc.io/docs/guides/auth/?

at a minimum, it will be easier to support long term, and maintaining a subsystem of authentication use to be a lot of works, specially to deal with security problems , since this custom subsystem become a magnet for big bounties and hackers to exploit

ttrpc is not really an implementation of grpc. It is a low-footprint alternative to grpc, intended for RPC between processes within the same host, using a simple binary framing protocol (length+header+payload) and typically over unix domain sockets, instead of HTTP as the transport. So SSL/TLS might not be the best fit here.

Now that you mentioned ttrpc itself, it just came to my mind that it also might be possible to plug in a modified version of this proposal using a ttrpc.Handshaker.

klihub avatar Mar 28 '25 15:03 klihub

define on per-plugin basis what kind of operations is allowed to be performed to plugin (e.g. disable modifications of hooks, or disable modification of anything else)

I think this kind of policy decision should be a non-goal from authN perspective. Instead, I'd like to have the framework keep track of the identity of the plugin that created the adjustment and then a validating plugin (described in https://github.com/containerd/nri/issues/137#issuecomment-2673008626) can make a policy descision.

have default security for unauthenticated plugins

What does this mean?

samuelkarp avatar Mar 28 '25 21:03 samuelkarp

define on per-plugin basis what kind of operations is allowed to be performed to plugin (e.g. disable modifications of hooks, or disable modification of anything else)

I think this kind of policy decision should be a non-goal from authN perspective. Instead, I'd like to have the framework keep track of the identity of the plugin that created the adjustment and then a validating plugin (described in #137 (comment)) can make a policy descision.

have default security for unauthenticated plugins

What does this mean?

@samuelkarp, Yes, I think we are talking about same approach, but in different words. This proposal is to cover only mechanism of authentication stage, so runtime instead of information about each plugin (socket, plugin_name, plugin_index) would have (socket, plugin_name, plugin_index, plugin_unique_id), where plugin unique id is plugin's public key, or empty string if plugin did not perform authentication handshake (or plugin is the old one, which doesn't know about auth mechanisms).

For authentication stage we can have potential (optional) knobs:

  1. allow/disallow registration of plugins that do not perform authentication (old, or ones that din't send their public key in registration message). Something like require_authentication.
  2. allow_list of "known plugins": runtime can be configured to have list of public keys of plugins that are allowed to register. For plugins that performed authentication but not listed in the allow list, we can treat them as unauthenticated or reject registration based on value of switch above.

After authentication phase, we are going towards authorization phase discussed in #137. But at that stage we already have validated information that for each registered plugin connection, we are sure that on other side of the socket is the plugin that we expect, and not something that pretended to be identified just by name "foo", like in current registration mechanism.

allowed_capabilities in the registration response above in proposal is optional part, I'm not insisting on it. My train of thoughts was that it can be generated by authorization stage (regardless how it is configured or implemented), just to provide to plugin at the registration stage early warning if something will not be allowed to plugin. So, plugin can continue or shutdown itself cleanly, if runtime validation is too restricted in that plugin's opinion.

kad avatar Mar 29 '25 09:03 kad

Now that you mentioned ttrpc itself, it just came to my mind that it also might be possible to plug in a modified version of this proposal using a ttrpc.Handshaker.

From my brief look at ttrpc code, it looks like client side ttrpc.Handshake mechanism is not implemented. I wonder if it can be implemented in backward compatible manner, that older ttrpc clients would be able to still talk to server that has handshake?

kad avatar Mar 29 '25 09:03 kad

I see the plugins use ttrpc as the transport, that is an implementation of grpc, is not possible to reuse the grpc existing mechansims than create a new one https://grpc.io/docs/guides/auth/? at a minimum, it will be easier to support long term, and maintaining a subsystem of authentication use to be a lot of works, specially to deal with security problems , since this custom subsystem become a magnet for big bounties and hackers to exploit

ttrpc is not really an implementation of grpc. It is a low-footprint alternative to grpc, intended for RPC between processes within the same host, using a simple binary framing protocol (length+header+payload) and typically over unix domain sockets, instead of HTTP as the transport. So SSL/TLS might not be the best fit here.

Now that you mentioned ttrpc itself, it just came to my mind that it also might be possible to plug in a modified version of this proposal using a ttrpc.Handshaker.

And actually yet another approach would be to

  • define a separate self-contained protocol/service just for establishing reliable client/plugin identity using a protocol similar to the one described above,
  • register that as an additional service on the NRI socket in the runtime
  • if the client authenticates before registering, store the established identity for the plugin, instead of a default one
  • do any necessary per-plugin authorization based on the client identity if, when and where it is necessary

klihub avatar Mar 29 '25 12:03 klihub

Now that you mentioned ttrpc itself, it just came to my mind that it also might be possible to plug in a modified version of this proposal using a ttrpc.Handshaker.

From my brief look at ttrpc code, it looks like client side ttrpc.Handshake mechanism is not implemented. I wonder if it can be implemented in backward compatible manner, that older ttrpc clients would be able to still talk to server that has handshake?

True but AFAICT, the only real implication of that would be that on the client side you first need to handle the handshake at the transport (connection) level, before deciding whether to go, and this is not that different from what you need to do on the server side, despite of the mere existence of a WithServerHandShaker option, if the handshake involves communication over the socket.

klihub avatar Mar 29 '25 12:03 klihub

I see the plugins use ttrpc as the transport, that is an implementation of grpc, is not possible to reuse the grpc existing mechansims than create a new one https://grpc.io/docs/guides/auth/?

gRPS auth mechanisms are mostly targeted to do transport layer security, than pure authentication part. SSL/TLS/ALTS are good connections over network, where integrity of data transferred between parties are critical and better to be encrypted. We are operating over unix sockets only, so for us transport layer encryption not really necessary. Authentication aspect is more important.

Implementing SSL/TLS mechanisms for authentication would mean need of configuring CA at Runtime, make sure that all plugins wanted to authenticate would be issued from this CA, then it will create additional problems with revoke/rotate/etc. ID of calling client in case of SSL/TLS is encoded in certificate subject, which is in the past also proven to be buggy in many implementations. Nowadays libraries in go/rust are good enough, but still, in my opinion it is big overkill for our need and huge maintenance burden for cluster admins and plugins authors.

And compared to OAuth we don't need external parties, no need to do calls from runtime to something else to validate if token from plugins is valid.

Those were my reasons why I considered this method of authentication instead of something else.

at a minimum, it will be easier to support long term, and maintaining a subsystem of authentication use to be a lot of works, specially to deal with security problems , since this custom subsystem become a magnet for big bounties and hackers to exploit

Indeed it might be attracting some hunters, however we need to consider:

  1. user need to have access to NRI socket. This socket is already now require quite high privilege level on the node to connect to it. If attacker has access to that socket, practically it has access to CRI or containerd raw API socket, which in turn means, attacker can start privileged container using low level APIs and get full root access on the node. Why trying to forge auth of NRI plugin if you already have full access to the host?
  2. we are not implementing crypto functions by ourselves, using proven to be strong enough algorithms and supplied from standard libraries.

ed25519 and chacha20poly1305 (rfc 8439) in past years proven to be strong and easy to use in many projects, for us it should be also good enough to work as "unique token" that identifies plugin, as well to be human-readable in the configuration files, if needed.

Another good thing about ed25519 keys handling mechanisms, that generation of private key can be easily done from random or pseudo-random data. Having private key, it is always possible to calculate public key. From public key, reverse calculation of private key is not possible. Private key can be easily embedded in the plugin binaries, taken from config or even fetched from other services (like CSP's IAMs) if that is considered to be more secure source of private key/"token" or cert (like in TLS/SSL case for gRPC).

kad avatar Mar 29 '25 12:03 kad

And actually yet another approach would be to

  • define a separate self-contained protocol/service just for establishing reliable client/plugin identity using a protocol similar to the one described above,
  • register that as an additional service on the NRI socket in the runtime
  • if the client authenticates before registering, store the established identity for the plugin, instead of a default one
  • do any necessary per-plugin authorization based on the client identity if, when and where it is necessary

my idea piggybacks on the fact we have bi-directional rpc between plugin and runtime. If we do one more service, it would be either several calls/responses initiated by plugin to runtime (server on runtime side, plugin is pure client), or it would require creation of similar dual server/client for auth both on plugin and runtimes side. I think it is also doable, just a question do we want to have auth mechanism as separate service or not, and which implementation complexity we want to maintain. From end result point of view, those should be similar.

kad avatar Mar 29 '25 12:03 kad

@kad It looks like this proposal covers some encryption and challenge mechanism, but I don't really understand how the described protocol achieves the goals you stated. Specifically, I see:

  • Something that looks like trust-on-first-use (TOFU) since the runtime does not already know the plugin's keys
  • No description of what the runtime does with the plugin's public keys once it receives it (does it store it? persistently?)
  • No description of what happens if there is attempted name reuse (should the runtime reject a plugin registering with the name of an already-registered plugin but a different key?)
  • No description of what the runtime's key pair is used for; why does the plugin need to re-encrypt the challenge?

Or is the idea that the runtime just stores a name+key pair, and then passes that to validating plugins to determine what the behavior should be?

samuelkarp avatar May 23 '25 20:05 samuelkarp

@kad It looks like this proposal covers some encryption and challenge mechanism, but I don't really understand how the described protocol achieves the goals you stated. Specifically, I see:

@samuelkarp @kad @chrishenzie I wanted to better understand this so I took a stab at prototyping/playing around with (my liberal interpretation of) this proposal and an updated version of containerd with support for NRI pluggable validation, updated slightly for plugin authentication as well. Here are my related trees, for NRI and for containerd.

  • Something that looks like trust-on-first-use (TOFU) since the runtime does not already know the plugin's keys
  • No description of what the runtime does with the plugin's public keys once it receives it (does it store it? persistently?)

The authentication model is inspired by Wireguard here. The pubic key is used both to recognize and to verify the authenticated identity. IOW, the key is indirectly also the identity. The runtime is configured with a set of public keys that it recognizes. Authentication attempts with unknown keys are rejected.

I think what should be much simpler here is that there should be no mutual distrust to start with. It should be enough for the runtime to distrust the plugin. So there should be no need for (static) key pairs on the runtime side, only for the plugin.

  • No description of what happens if there is attempted name reuse (should the runtime reject a plugin registering with the name of an already-registered plugin but a different key?)

I think that question is more related to #167 than to authentication. IOW, what should we do with multiple registrations of the exact same plugin instance (which is just implicitly ${INDEX}-${NAME}). Adding authentication to the picture brings in an extra dimension, but at the same time it should also limit the set of viable choices to consider.

  • No description of what the runtime's key pair is used for; why does the plugin need to re-encrypt the challenge?

The proposed details were a bit unclear to me also based on the description. For prototyping I went with something so dumb even I could understand (which is probably stupid from a security point of view), since I was mostly interested in how this would fit together with validation.

Or is the idea that the runtime just stores a name+key pair, and then passes that to validating plugins to determine what the behavior should be?

I think that's the main idea with this proposal. To provide enough and reliable extra info for validation, to allow more fine-grained lock down of capabilities than just a global on/off for everybody.

klihub avatar May 26 '25 11:05 klihub

@kad It looks like this proposal covers some encryption and challenge mechanism, but I don't really understand how the described protocol achieves the goals you stated. Specifically, I see:

  • Something that looks like trust-on-first-use (TOFU) since the runtime does not already know the plugin's keys

Runtime might know some of plugin's public keys (e.g. specified in internal validator plugin's rules, if we want to do so). But generally, plugin's communication of "here is my public key" is just handshake part where plugin says: "I'm instance of the plugin uniquely identified by $public_key". It serves as identity, like in ssh.

  • No description of what the runtime does with the plugin's public keys once it receives it (does it store it? persistently?)

The most important part of using received public key is to validate that plugin actually has private key part for that announced public key, which is verified by the rest of the handshake. (Avoiding impersonating other plugins). Later that public key can be used as unique identificator for plugin to be used in validation rules.

  • No description of what happens if there is attempted name reuse (should the runtime reject a plugin registering with the name of an already-registered plugin but a different key?)

@klihub mentioned that above, it is falling more to the scope of what do we do with multiple instances of the same plugin (but might be different versions in case of DaemonSet upgrade rollouts). Handshake serves the purpose of identifying that if something connecting and saying e.g. "I'm Acme Corp. special plugin identified by $public_key", that this connection is really plugin that has both private and public key for "Acme Corp's special plugin", not something that somehow learned public part of the plugins' keypair.

  • No description of what the runtime's key pair is used for; why does the plugin need to re-encrypt the challenge?

In minimal case, the Runtime's keypair is used to complete handshake between runtime and plugin. Runtime sends encrypted with plugins' public key challenge, and expect during this handshake session to receive re-encrypted to Runtime's public key challenge from plugin. That ensures that plugin:

  1. has private key, corresponding to what plugin advertised public key to runtime. (by being able to decrypt challenge)
  2. supports correct encryption/keying mechanisms by re-encrypting challenge to Runtime's public key.
  3. runtime validates that auth mechanisms implemented properly by decrypting challenge with own private key and comparing with original data.

In maximum case (but we need to check what @klihub implemented in his PoC for Auth) we can also do validation of the Runtime on plugin side. Example: special closed source only plugin Foo is licensed to run only on CSP Bar. It expects that Runtime on nodes at CSP Bar are pre-configured with "static" keypairs, so plugin expects that runtime will be representing itself with public key "abc". So, final step of the handshake might be additional message that proofs to plugin Foo that runtime also has private key part of CSP Bar's expected public key by being able to decrypt some data that plugin sent to runtime encrypted to Runtime's public key. That is not part of original proposal above, but I see that someone might want to have such usecase, so we better think about it from the beginning.

Or is the idea that the runtime just stores a name+key pair, and then passes that to validating plugins to determine what the behavior should be?

public keys are unique identifiers of plugins, preventing "impersonating" some other plugins. Yes, those should be passed down to validators as part of plugins' ID. Think about it like either wireguard or maybe closer to SSH sessions. As each SSH session is having:

  • target username where remote entity trying to authenticate to (in our case is plugin name, maybe + index).
  • authorized key in a file (in our case it is public key that plugin announces)
  • commands that are allowed to execute for that public key (in our case that will be enforced in validation mechanisms)
  • pty that e.g. "sudo" uses to cache timestamp of user session (in our case socket connection, instance connection of the plugin)

hope that explanation helps :)

kad avatar May 26 '25 17:05 kad