OpenSearch Add runAs to Subject interface and introduce IdentityAwarePlugin extension point

Description

Companion Security PR: https://github.com/opensearch-project/security/pull/4665

This PR adds new method to the Subject interface called runAs and introduces a new extension point called IdentityAwarePlugin. This new method can be utilized to run a callable block of code in the context of the subject.

This PR also introduces the notion of an IdentityAwarePlugin which contains a single method to initialize the plugin. The IdentityPlugin will get to define which subject is passed to IdentityAwarePlugins on initialization. Practically, the Security Plugin is the IdentityPlugin and the subject passed to IdentityAwarePlugins is a special pluginSystemSubject which can be utilized to perform transport actions in the plugin system context and outside of the authenticated user context.

Subject.runAs() is intended to be an abstraction and replacement on the current pattern of System index interaction in which plugins will utilize the ThreadContext class and stash the context before performing transport actions on system indices.

With the changes in this PR, the NoopIdentityPlugin will assign a NoopPluginSubject to IdentityAwarePlugins. When using NoopPluginSubject.runAs(() -> { ... }) it runs the Callable within a block where the ThreadContext is stashed which is the same behavior plugins use today so its a direct replacement to try (ThreadContext.StoredContext ctx = threadContext.stashContext()) { ... }.

By itself, this PR is not very interesting, but with the changes in this PR it enables the Security plugin (the IdentityPlugin) to define an implementation for PluginSubject. If you look at the companion PR, the Security implementation of PluginSubject injects a pluginUser into the ThreadContext so that we can start using the existing security authz mechanisms to gate way plugins can do within the runAs block.

Related Issues

Related to: https://github.com/opensearch-project/security/issues/4439

Check List

[X] Functionality includes testing.
[X] API changes companion pull request created, if applicable.
[X] Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Jul 02 '24 20:07 cwperks

:x: Gradle check result for 444fde76bca22def9ded0b3be82383a7cf2e08bf: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 02 '24 21:07 github-actions[bot]

:x: Gradle check result for aaba0633b70fb67c4d5573fd8f8d3133a78251db: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 09 '24 21:07 github-actions[bot]

:x: Gradle check result for eec8b96a1b94be99ea9619eb8bc9125b5faed3b6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 10 '24 02:07 github-actions[bot]

:x: Gradle check result for eec8b96a1b94be99ea9619eb8bc9125b5faed3b6: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 10 '24 13:07 github-actions[bot]

@cwperks I have tons of questions regarding this change, it seems the scope of the change is to propagate the caller identity, is that right? If yes, we should:

make it a first class citizen of the ThreadContext, not limited to REST actions (support extensions, general thread pool scheduling, ...)
design it as a chain of callers instead, fe Plugin A -> Plugin B -> ... since in many cases plugin could call another plugin or extension

The way it is implemented now makes the ThreadContext mutable since anyone could just call ThreadContext::setExecutionContext any time, anywhere. Also, we have to think about propagating caller chains between plugin and extensions, the ThreadLocal won't work there

Jul 10 '24 14:07 reta

:grey_exclamation: Gradle check result for eec8b96a1b94be99ea9619eb8bc9125b5faed3b6: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Jul 10 '24 14:07 github-actions[bot]

Codecov Report

Attention: Patch coverage is 48.97959% with 25 lines in your changes missing coverage. Please review.

Project coverage is 71.92%. Comparing base (b30df02) to head (681ea8f). Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
server/src/main/java/org/opensearch/node/Node.java	33.33%	7 Missing and 1 partial :warning:
.../opensearch/identity/shiro/ShiroPluginSubject.java	0.00%	6 Missing :warning:
.../main/java/org/opensearch/rest/RestController.java	0.00%	4 Missing :warning:
...opensearch/identity/shiro/ShiroIdentityPlugin.java	0.00%	3 Missing :warning:
...rch/extensions/rest/RestSendToExtensionAction.java	0.00%	1 Missing :warning:
.../java/org/opensearch/identity/IdentityService.java	88.88%	0 Missing and 1 partial :warning:
...src/main/java/org/opensearch/identity/Subject.java	0.00%	1 Missing :warning:
...va/org/opensearch/plugins/IdentityAwarePlugin.java	0.00%	1 Missing :warning:

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #14630      +/-   ##
============================================
- Coverage     71.92%   71.92%   -0.01%     
- Complexity    63328    63357      +29     
============================================
  Files          5227     5231       +4     
  Lines        296497   296526      +29     
  Branches      42830    42832       +2     
============================================
+ Hits         213248   213267      +19     
+ Misses        65764    65676      -88     
- Partials      17485    17583      +98

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Jul 10 '24 14:07 codecov[bot]

@reta Thanks for the review! I raised this PR to solicit comments just like what you raised.

You are right, that the ultimate goal is to keep track of which plugin is currently on the execution path and to create a generic mechanism to do that. This PR shows the use of the proxy pattern to show how it can keep track for RestHandler, but it doesn't cover other extension points. On my fork I created a proxy for ActionPlugin which shows how other extension points can be wrapped similarly to what is done in this PR.

FYI with this change execution context is immutable and follows the same pattern as the rest of the ThreadContext. If the execution context is already set, trying to override it will throw an IllegalArgumentException similar to ThreadContext.putSingleHeader.

The only instance I am aware of for plugin chaining is Job Scheduler like this example where GeoSpatial uses the JS LockService when creating a datasource via Rest API. What are some other examples of plugin chaining?

Jul 10 '24 14:07 cwperks

Thanks @cwperks

This PR shows the use of the proxy pattern to show how it can keep track for RestHandler, but it doesn't cover other extension points.

I think this is not a point, right? The problem is much wider than that.

FYI with this change execution context is immutable and follows the same pattern as the rest of the ThreadContext. If the execution context is already set, trying to override it will throw an IllegalArgumentException similar to ThreadContext.putSingleHeader.

No, it is not immutable, it could be changed any time, anywhere:

ThreadContext::clearExecutionContext();
ThreadContext::setExecutionContext("whatever I want here");

Jul 10 '24 14:07 reta

I think this is not a point, right? The problem is much wider than that.

It is, but RestHandlers are a good area to start to demonstrate this change since that is where most plugin system index interaction takes place. If we can address all extension points at once, that would be my preference but the scope of the change is much larger.

No, it is not immutable, it could be changed any time, anywhere:

My bad, in a previous iteration I had those methods as package-private. Will update this to not expose these to plugins.

Jul 10 '24 18:07 cwperks

@cwperks could you may be start with the issue first to describe what we would like to capture, why and under which circumstances? We could collect possible implementation options there and get feedback. Thank you.

Jul 10 '24 19:07 reta

@reta Definitely, I will open an issue in the core repo. This PR is related to this RFC in the security plugin: https://github.com/opensearch-project/security/issues/4439

Basically, I want to have a Stack keep track of which plugins have been delegated to from the core in a particular thread.

Jul 10 '24 21:07 cwperks

:x: Gradle check result for 8a986185cf9c46e1349fd0ad2d7058e27898be56: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 10 '24 21:07 github-actions[bot]

@reta here is the issue I promised: https://github.com/opensearch-project/OpenSearch/issues/14733

Jul 11 '24 20:07 cwperks

:x: Gradle check result for 73835edac35ffad741746d3420b6347291c7c10d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 12 '24 15:07 github-actions[bot]

:x: Gradle check result for dc5c5a8ef85e06b76cfc61dab5535272321f287f: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 12 '24 15:07 github-actions[bot]

:x: Gradle check result for 70c0c800883701a78ab88bddc29b6c81abd72725: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 12 '24 15:07 github-actions[bot]

:x: Gradle check result for 4d048b74267da1ab9702c254c04c3d4c80ef806e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 12 '24 15:07 github-actions[bot]

:x: Gradle check result for e26ac6ce7f4f35df483ec2c788325d677041248c: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 12 '24 17:07 github-actions[bot]

:white_check_mark: Gradle check result for c2e0a645ebe2eeebc2e8b753f9aa884a8fa28040: SUCCESS

Jul 12 '24 18:07 github-actions[bot]

:x: Gradle check result for 55fe24c33b3cbd3aa26eee0de903eac39bb5df08: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 15 '24 14:07 github-actions[bot]

@reta FYI in case you are curious, I opened a draft PR in the security repo that demonstrates how the changes in this PR are intended to be used: https://github.com/opensearch-project/security/pull/4570

Let me know your thoughts.

Jul 16 '24 13:07 cwperks

@reta FYI in case you are curious, I opened a draft PR in the security repo that demonstrates how the changes in this PR are intended to be used: opensearch-project/security#4570

Thanks @cwperks , I will try to get back to the issue shortly and share some ideas, have difficulties with time right now, sorry about that

Jul 16 '24 13:07 reta

:grey_exclamation: Gradle check result for 1d7df430a657a37b937f6281f73be12deb54e130: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Jul 16 '24 13:07 github-actions[bot]

I would love a solution that is a viable alternative for plugins to have a clean access model even if they don't fix the fundamental access exposed to the thread context and root node client as those are fundamental breaking changes to the plugin APIs.

I think this would need to be addressed to solve the issue completely. How would a plugin be prevented from doing anything if it has access to the threadPool and root node client?

Jul 17 '24 16:07 cwperks

How would a plugin be prevented from doing anything if it has access to the threadPool and root node client?

Why they should have access to threadPool and node client by default? We do have clear means to solve that

Jul 17 '24 17:07 reta

:x: Gradle check result for ed7124739f1517c68bb57292c639a8fd2003f4ce: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 23 '24 20:07 github-actions[bot]

:x: Gradle check result for ed7124739f1517c68bb57292c639a8fd2003f4ce: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 23 '24 20:07 github-actions[bot]

:x: Gradle check result for 4b27f3a5a081d37d4f299f04bd90e5cbc62fa150: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 23 '24 21:07 github-actions[bot]

:x: Gradle check result for b031db9306089872bbf984b66e378790a3852061: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Jul 23 '24 21:07 github-actions[bot]