data-prepper
data-prepper copied to clipboard
Validate data prepper configurations without running Data Prepper
Is your feature request related to a problem? Please describe. As a user of data prepper, I would like to be able to validate my pipeline configurations without having to start the entirety of data prepper with multiple servers and without having to use real permissions for opensearch, s3, etc.
Describe the solution you'd like
A new module within data prepper called data-prepper-validation-api
. This module would provide a library to validate pipeline configuration without starting the entirety of data prepper. It will utilize data-prepper-core code to convert the pipeline configuration into the model for each data prepper plugin, and run the jsr380 validations for those plugins. Additionally, this module would be responsible for constructing instances of plugins with a dependency on only the configuration that the plugin uses.
In order to achieve this, the data prepper directory structure will need to change to the structure proposed in https://github.com/opensearch-project/data-prepper/issues/1503 for data-prepper-core.
After these directory structure changes are complete (only need the changes for splitting out data-prepper-core), we will add the data-prepper-validation-api
The data-prepper-validation-api
module will take a dependency on all the data-prepper-plugins that are configured in the pipeline configuration, as well as some of the libraries extracted from data-prepper-core (data-prepper-pipeline
, data-prepper-plugin-framework
), and will provide a library to run these validations given a pipeline configuration yaml string, and return error messages for invalid configurations. The following dependency hierarchy will be the end result
data-prepper-core
+---- data-prepper-validations
+---- data-prepper-plugin-framework
+---- data-prepper-pipeline
data-prepper-validations
+---- data-prepper-pipeline
+---- data-prepper-plugin-framework
data-prepper-validation-api
+---- data-prepper-validations
+---- opensearch
+---- s3-source
+---- grok-processor
... This can have all plugins so that it can perform actual validations
In order to validate more than just the jsr380 validations, plugins will need to be instantiated just with the configuration model associated with that plugin. Additionally, some plugins do not use jsr380 and the @DataPrepperPluginConstructor
annotation for their configurations. For example, to run validations that the grok patterns configured in a grok processor are valid, the grok processor will need to be instantiated because it takes a PluginSetting
object in its constructor and will validate that PluginSetting itself, rather than having data-prepper-core validate it.
While we could start with just validating plugin models with jsr380, I am proposing that we add an optional annotation to data-prepper-api
that can be used by all plugins, that being @DataPrepperValidateApi
. This annotation could be added to either an existing or new constructor that only requires the configuration, whether is it a PluginSetting
or a custom config that is converted by the data-prepper-plugin-framework
. The data-prepper-validation-api
would then look for this annotation on plugins (if it is not found then no extra validations are run), and use it to instantiate the plugin with its configuration. The plugin would then be able to run any validations that it reasonably can in this constructor without creating all of the additional dependencies (servers, clients, etc.), and would be able to provide error messages for this. For example, the grok processor could have the following constructor,
@DataPrepperValidateApi
GrokProcessor(final PluginSetting pluginSetting, final List<String> errors) {
final GrokProcessorConfig grokConfig = buildConfig(pluginSetting);
errors.add(validateGrokPatterns());
}
Describe alternatives you've considered (Optional) A clear and concise description of any alternative solutions or features you've considered.
Additional context Data Prepper directory structure changes proposal (https://github.com/opensearch-project/data-prepper/issues/1503)