opensearch-sdk-java
opensearch-sdk-java copied to clipboard
[META] Add ability for extensions to require dependencies
Is your feature request related to a problem?
Extensions may depend on other extensions. For example, a machine learning model may depend on data cleaning and preparation by another extension, or other analogs to an ETL pipeline. By design, extensions will not be required or expected to communicate directly with each other, which means the Extensions Orchestrator must provide a means to:
- Allow extensions to discover whether their dependent extensions are installed and available
- Orchestrate workflows involving multiple extensions, where completion of a step by one extension triggers an action by another extension.
What solution would you like?
- [x] Extensions need a mechanism to define dependencies: https://github.com/opensearch-project/opensearch-sdk-java/issues/151
- [x] Extensions need to be able to query the Orchestrator to validate their dependencies are initialized: #149
- [ ] The Extension Orchestrator must provide the ability to relay request/response actions across extensions: https://github.com/opensearch-project/opensearch-sdk-java/issues/152
What alternatives have you considered?
- Not allowing dependent extensions and requiring complex dependencies as described to be declared in a larger, monolithic extension. This reduces the modularity and re-use of components and increases complexity and error potential.
- Allowing extensions to directly communicate with each other to accomplish these goals. While we have no means to specifically prohibit this, we also have no way to require it and enable it to be accomplished securely so we should not rely on its availability.
Do you have any additional context?
This relates to #65 which in turn related to comments on #60.
It is conceivable that a workflow capability (and perhaps some of the discovery work) could be part of some core features of the Job Scheduler.
We can follow the same mechanism present in current architecture of OpenSearch of installing the dependent plugin. Not exactly but something like:
- If an extension is dependent on another extension, it can send the name of the dependent extension to OpenSearch while sending the initial request.
- OpenSearch can check if the dependent extension is registered with OpenSearch.
- If present, we can modify the current initialExtensionReponse and send a success response.
- If not present, then the extension owner has to make sure the dependent extension gets registered before the actual extension sends an initialization request to OpenSearch.
Our initialization goes the other way right now, though. We assume all the extensions are up and running before starting OpenSearch. OS sends the init requests and right now extensions respond with just their name.
In the future we envision adding/removing extensions on the fly, so I'd suggest the following sequence:
- OpenSearch maintains a list of all extensions and maps them to their current state (running/not) and dependencies.
- OS looks for extensions on startup, and we can add a REST request to "refresh" the list, or OS can check periodically.
- When an extension is initialized (or re-initialized after failure, reboot/version upgrade/etc.) it will send its dependencies (or it sends a second request, like we do with the REST request registration).
- We might want to put this with the register rest request, as we wouldn't want to register if we don't have all our dependencies
- When an extension becomes unreachable, its REST requests should be temporarily overwritten with an "unreachable" resopnse, and this should apply to any other extension which has it as a dependency.
https://github.com/opensearch-project/OpenSearch/pull/5438/files#r1044935448