opensearch-k8s-operator OpensearchUser,Role,RoleBinding resources are not re-applied after SecurityConfig reconciliation

trafficstars

As SecurityConfig reconciliation overwrites the internal users in .opendistro_security (if the field is set in the secret). This is intended behaviour and to be expected, but as of now also means that all users (or roles and role bindings) created with the newly supported resources will also be removed and .opendistro_security will be set to just the values in the SecurityConfig secret. The current reconciliation logic is not able to detect this change and re-apply the already configured users, roles and role bindings. Meaning while the k8s resources will still be present and report 'unchanged' on apply, opensearch it self will not reflect the configuration correctly.

Now for roles and role bindings we can avoid this "problem" by simply not including them in the SecurityConfig secret and only configuring them using the resources, but for OpensearchUser resources this is not really possible due to limitations of the operator. As of right now the operator only supports Basic Auth for any processes (other than SecurityConfig reconciliation) involving a clusters OpenSearch API. Including applying OpensearchUser resources via the API. This means we always need at least one admin user configured via the security config so it can be used by the operator to manage the cluster. The only possible work around I can see for now would be to never ever modify the internal_users.yaml entry or removed it from the SecurityConfig secret after initial deploy.

It might be worth adding the above problem as disclaimer in the documentation for now and if possible implement logic in the controller to re-apply the resources on SecurityConfig reconciliation/detect that the configuration has drifted. Doing this would at least eventually restore the users.

Also ideally it would be possible to configure the cluster deployment to not rely on basic auth, but instead use something like client certificates instead. As is also mentioned in https://github.com/Opster/opensearch-k8s-operator/issues/221

Aug 18 '22 06:08 max-frank

Hi @max-frank. You raise an interesting situation. The securityconfig is needed initially to configure the user the operator then uses to provision any other users. But changing the securityconfig overwrites the other users. Your suggestion of reapplying the Users after a securityconfig change might work but I think its not a good solution as then there is a time window between securityconfig applied and Users recreated where these users have no access.

I see two possibilities:

Switching the operator to use a client certificate when provisioning User/Role/RoleBindings as you suggested. Then no users need to be configured in the securityconfig. This is probbaly the cleanest solution.
Not to reapply the securityconfig if any Users are provisioned. Not ideal as that means the admin user can never be changed. More like a short term solution.

@dbason Do you have any thoughts on this?

Aug 24 '22 09:08 swoehrl-mw

I agree using client certs is the cleanest solution here since as you mention re-applying would cause disruptions for any local account configured. Also maybe an added suggestion to that would be support for creating additional client certificates for extra local users when the autogen option is used.

Aug 24 '22 23:08 max-frank

since internal users, role mappings and roles can all be handled via the operator using CRD, wouldn't make sense to simply remove those from the security config secret and only use the defined CRDs instead?

as a matter of fact, it might be even better to move action groups, node dn, whitelist and tenants to custom CRDs, and only leave "config.yml" to the security config secret.

to address the problem of the admin user inside internal_users i would suggest this: https://github.com/Opster/opensearch-k8s-operator/issues/380

Dec 07 '22 10:12 anubisg1

@anubisg1 That might very well be the best solution. Have the user either use only securityconfig or (basically) only CRDs.

Dec 07 '22 12:12 swoehrl-mw

@swoehrl-mw If we want to have a separate CRD for everything (except the config.yml), and remove the same from security-config secret, I would be willing to contribute the same. The security-config overwriting everything created via REST API or Dashboard UI is quite an inconvenience for our use-case as well.

Apr 12 '23 07:04 saketmht

Your suggestion of reapplying the Users after a securityconfig change might work but I think its not a good solution as then there is a time window between securityconfig applied and Users recreated where these users have no access.

The time window will be quite small and happen infrequently?

There will also be a time window when manually editing roles/bindings/users with dashboard/API.

I think that a reconcile that checks current state every say 10-30 minutes and fires after securityconfig-update will be a good enough solution.

Apr 13 '23 07:04 arve0

@saketmht Such a PR would be highly appreciated. For backwards compatibility the old securityconfig logic needs to remain, but it can be changed to only apply the config.yml if the others are not defined and apply everything if the ymls are there (that way people can still use securityconfig if they want or they can use CRDs). One complexity I see: The reconcilers for the users and co currently use the opensearch admin user. Without the securityconfig-job there would be a sort of chicken-egg-problem. The best solution is probably to switch the reconcilers to use a certificate instead. Another point to consider: The health checks also use the admin-user, so we need a solution for that as well.

Apr 13 '23 08:04 swoehrl-mw

@swoehrl-mw

Such a PR would be highly appreciated.

Then I will create separate issues (one at a time) for each CRD that I will add. We can discuss the CRD fields/designs in the same issue. Is that okay?

The best solution is probably to switch the reconcilers to use a certificate instead.

I am not sure how to achieve this exactly. But I think we can do this later as well. Since we are using a common code to create the os client everywhere to interact with the OS APIs, this can be changed after we are done with the CRD stuff.

The health checks also use the admin-user, so we need a solution for that as well.

Do you mean the cluster health via the API? Like I mentioned earlier we are using the same code to create the client, so if we fix that this should be fixed as well.

Apr 13 '23 17:04 saketmht

@saketmht

Then I will create separate issues (one at a time) for each CRD that I will add. We can discuss the CRD fields/designs in the same issue. Is that okay?

Sounds good. Thank you.

Do you mean the cluster health via the API?

I'm refering to the kubernetes probes (https://github.com/Opster/opensearch-k8s-operator/blob/main/opensearch-operator/pkg/builders/cluster.go#L199), these are separate from the api calls the operator does.

Apr 14 '23 11:04 swoehrl-mw

@swoehrl-mw

I'm refering to the kubernetes probes

For this I think, the admin certs (which are used by the securityconfig-job to execute the securityadmin script) can be mounted to the cluster pods as well and pass those certs to the curl command instead of the admin:admin credentials.

For the operator itself, since the operator creates those certs, I don't think we can mount them to the operator pod itself, but we could read the admin-cert secret and use that to create the os client. This way the reconcilers won't need the admin:admin credentials either. I am not sure if its the best way to achieve this though.

Apr 18 '23 07:04 saketmht

@saketmht

For this I think, the admin certs (which are used by the securityconfig-job to execute the securityadmin script) can be mounted to the cluster pods as well and pass those certs to the curl command instead of the admin:admin credentials.

For the operator itself, since the operator creates those certs, I don't think we can mount them to the operator pod itself, but we could read the admin-cert secret and use that to create the os client. This way the reconcilers won't need the admin:admin credentials either. I am not sure if its the best way to achieve this though.

Sounds good to me. It's not ideal to mount the admin certs in all the pods but currently I can't think of a better way. Using certs is likely the best way.

Apr 28 '23 11:04 swoehrl-mw

I think we can also get around this by doing some additional work in the controller to trigger reconciliations when things change (this will require some more advanced usage of controller runtime though)

May 25 '23 21:05 dbason

@anubisg1 That might very well be the best solution. Have the user either use only securityconfig or (basically) only CRDs.

I was trying exactly this. ** (basically) only CRDs for users, roles, etc** But without the admin in the securityconfig i can't get the cluster to bootstrap.

Mar 01 '24 17:03 securitym0nkey

opensearch-k8s-operator opensearch-k8s-operator copied to clipboard

OpensearchUser,Role,RoleBinding resources are not re-applied after SecurityConfig reconciliation

opensearch-k8s-operator
opensearch-k8s-operator copied to clipboard