Support for updating security.json through Operator / Helm
A user can now bootstrap security.json through operator, even by providing the raw security.json file in a Secret that will be uploaded to ZK on first Solr Pod start through an init-container.
Users may want to also manage changes to security.json in the same manner, i.e. if they use GitOps. This won't work today, as the bootstrapSecurityJson feature only works if security.json is missing in ZK. So a workaround now is to delete the file manually from Zk and then trigger a restart of a Solr Pod, which will then bootstrap security once again.
I'm not suggesting to change default behavior, as the pure 1st-time boostrap feature is useful for those who just want to bootstrap and then continue editing security through Solr's Auth APIs or AdminUI's nice Securty editor.
My proposal is therefore to add an overwrite: true property to existing bootstrapSecurityJson, which would cause an upload on every node restart. We'd still need to figure out a way to trigger the upload anytime the Secret changes.
bootstrapSecurityJson:
name: security-json-secret
key: security_json
overwrite: true
If you push a an update to the secret in which security.json is stored (through whatever tooling you choose), it would be nice if the version change was detected by Operator and triggered a re-upload (e.g. by restarting a POD). However, it may be that the operator runs in a different namespace than each solr cluster, and that the operator pod does not have rights to read secrets in other name spaces (or could it read metadata/version but not contents?).
However, it may be that the operator runs in a different namespace than each solr cluster, and that the operator pod does not have rights to read secrets in other name spaces
So we have the operator read secrets for various things. This ^ is only true for loading in secrets as a volume mount. Going through the API server we can read anything as we have the RBAC permissions to read them.
The way I see this going, much like the other configMap/Secrets we read and pass to the SolrCloud pods, we will likely need to hash the security.json file and use that in the small bash command that we run when deciding whether to upload it or not.
Wanted to piggyback on this conversation with a process used elsewhere to help knowledge-share and maybe draw some inspiration. My organization uses the CrunchyData Operator for Postgres which includes custom user creation, permission mapping, and password rotation strategies for the database users. Here is the related CRD reference for context.
On the PostgresClusters object, the spec for user specification looks similar to:
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: my-postgrescluster
spec:
users:
- name: app-user
options: LOGIN CREATEDB
- name: readonly-user
options: LOGIN
password: AlphaNumeric
- name: permissions-user
options: CREATEROLE
This operator stores credential information in secrets using the following format:
<postgres-name>-pguser-<username>
which could be translated to something like the below for Solr:
<solr-name>-solrcloud-user-<username>
...and these secrets contain the following information:
{
"host": "service-name.namespace.svc",
"password": "cleartext-password",
"port": "5432",
"user": "username",
"verifier": "hashed/encrypted password"
}
What is interesting about this process is the use of the verifier value. Like Solr, Postgres has a file containing hashed/encrypted passwords. The Operator uses the verifier value to check if the correct password is configured. If there is a mismatch, the Operator will apply the password from the Secret and overwrite the verifier value. This behavior provides a way to force a password re-apply, by deleting or clearing the verifier value. For password rotations, we update the value of password and clear the verifier value, which allows the operator to handle rotation for us.
What I'm really interested in is going as far as fully abstracting the security.json file into the Solr CRD and Operator functions. Here's a spec that could potentially be used to generate a security.json for basic auth. Considering there are very different needs for different authentication plugins, also I chose to separate basic auth configuration from the ability to specify a configMap/secret/string for the security.json file
spec:
solrSecurity: # Reference to secret with security.json. Mutually exclusive with solrBasicAuth.
# secret, configMap, and value are mutually exclusive
secret:
name: # Name of secret
key: # secret key containing security.json
configMap:
name: # Name of configMap
key: # configMap key containing security.json
value: # String representing security.json (if this support is desired)
solrBasicAuth: # Abstracted security.json file. Mutually exclusive with solrSecurity.
blockUnknown: false
users:
solr:
roles: [ admin ]
custom-user:
roles: [ custom-role ]
roles: # Potentially optional. Could be generated dynamically from users[].*.roles[] and permissions[].*.roles[]
foobar-role:
users: [ custom-user ] # Required
permissions: [] # Optional
permissions:
security-edit:
roles:
- admin
custom-permission:
roles: [ custom-role ]
collection: *
path: /admin/ping
methods: GET
params:
key: value
Personally, I see a lot of value in being able to manage all users, credentials, and permissions from the Solr/SolrCloud objects. Today, we have a post-install process that configures additional users and permissions via the Solr API, which is less than ideal for a situation where we need to fully rebuild or replicate an installation. Our process also creates opportunities for users to be created and credentials to be rotated but not stored in our password manager application, due to user error. It also prevents us from handling automatic credential rotation and propagation to other platforms - something that would be possible if user credentials were stored as secrets.
I'm curious to know what your thoughts are about this level of Operator integration, and if there's interest in moving in such a direction. I'm also happy to have a continued conversation on this topic. Thank you!
Thanks for your input. I'm a bit reluctant to bloat the operator with a deep understanding of all the workings of all the auth/authz options for Solr. Keeping the java and go understanding of this in sync may prove difficult, also 3rd party auth plugins must be able to configure too. So if we go this direction, I'd rather have the operator only do a thin wrapper and convert the YAML directly to security.json and push it to Solr/ZK.
A disadvantage with pushing to ZK is that you don't get feedback on whether it applied or not, there is just a stacktrace in solr's log. And a disadvantage of pushing to /api/cluster/security/authentication/ endpoint is that you may lock yourself out if you do a mistake, i.e. remove admin role for the operator's own admin user...