accumulo icon indicating copy to clipboard operation
accumulo copied to clipboard

Added interface to support SPI configuration validation

Open dlmarion opened this issue 1 year ago • 18 comments

This commit adds the SpiConfigurationValidation interface and an initial implementation that attempts to perform configuration validation on classes referenced in the configuration. No-arg constructors are required for validation.

dlmarion avatar Mar 05 '24 17:03 dlmarion

I figured I would put up a straw-man implementation for the comment I made in #4315. The idea here is to provide some configuration validation mechanism for classes that use custom properties. This should also likely be done at startup and not just upgrade as configurations will change over time.

dlmarion avatar Mar 05 '24 17:03 dlmarion

@EdColeman - I assume that your approval means that you think this approach is good to move forward? Obviously this PR is incomplete, I think many more SPI classes need to be modified.

dlmarion avatar Mar 05 '24 18:03 dlmarion

Yes - these changes seem reason and worthwhile to proceed forward.

EdColeman avatar Mar 05 '24 18:03 EdColeman

I like this concept. I was wondering if we could add default methods to existing interfaces for this new validation that calls init() for some plugins. Experimented with doing that in this commit on this branch. From this I learned that some plugins are per table, so we may want to facilitate validating them. Also learned that some plugins may want to know the property that contained the class name. Tried to deal with these issues.

For the per table config, thinking we could iterate over all tables validating any plugins configured for the table. Not sure where this should actually run? Maybe only the manager should do this validation?

keith-turner avatar Mar 05 '24 19:03 keith-turner

I like this concept. I was wondering if we could add default methods to existing interfaces for this new validation that calls init() for some plugins.

I'm not a huge fan of default methods in general, because if you forget to override them then it may be incorrect.

For the per table config, thinking we could iterate over all tables validating any plugins configured for the table. Not sure where this should actually run? Maybe only the manager should do this validation?

I have some local changes that I have not pushed yet - but could if you want to see them. Locally I moved the validation to ServerContext so that the validation is performed on server start.

dlmarion avatar Mar 05 '24 19:03 dlmarion

I like this concept. I was wondering if we could add default methods to existing interfaces for this new validation that calls init() for some plugins. Experimented with doing that in this commit on this branch. From this I learned that some plugins are per table, so we may want to facilitate validating them. Also learned that some plugins may want to know the property that contained the class name. Tried to deal with these issues.

For the per table config, thinking we could iterate over all tables validating any plugins configured for the table. Not sure where this should actually run? Maybe only the manager should do this validation?

I removed this code with my changes in #4273 since the config is no longer across multiple properties. But would it make sense to add some sort of on-demand SPI validation utility? https://github.com/apache/accumulo/blob/2.1/server/base/src/main/java/org/apache/accumulo/server/conf/CheckCompactionConfig.java

ddanielr avatar Mar 05 '24 19:03 ddanielr

I have some local changes that I have not pushed yet - but could if you want to see them. Locally I moved the validation to ServerContext so that the validation is performed on server start.

I think it makes sense to have every server validate the system properties. However uncertain about this for the per table properties, seems like having every server validate those could place a surge of load on ZK.

keith-turner avatar Mar 05 '24 19:03 keith-turner

I removed this code with my changes in #4273 since the config is no longer across multiple properties. But would it make sense to add some sort of on-demand SPI validation utility? https://github.com/apache/accumulo/blob/2.1/server/base/src/main/java/org/apache/accumulo/server/conf/CheckCompactionConfig.java

The CompactionPlanner is an SPI, so DefaultCompactionPlanner could implement the validation interface and it would be validated on startup.

dlmarion avatar Mar 05 '24 19:03 dlmarion

I'm not a huge fan of default methods in general, because if you forget to override them then it may be incorrect.

I agree with that in general. I think this case is a bit different through because its an optional interface that has to be extended/implemented.

keith-turner avatar Mar 05 '24 19:03 keith-turner

I think it makes sense to have every server validate the system properties. However uncertain about this for the per table properties, seems like having every server validate those could place a surge of load on ZK.

Since it's part of ServerContext (changed in ab116dc), we could add options to the ServerContext constructor to enable/disable certain validation. Then we can enable some only in the Manager to address your concern about ZK load.

dlmarion avatar Mar 05 '24 19:03 dlmarion

I'm not a huge fan of default methods in general, because if you forget to override them then it may be incorrect.

Instead of adding the default method to interfaces we could instead make all of accumulo own plugin impls in 2.1 implement the interface. In 3.1 we could make all SPI interfaces extend the validation interface w/o providing a default impl, this would be a breaking change in 3.1 that would make the method mandatory for all plugins. So in 2.1 implementing the interface would be optional and in 3.1 it would be required.

keith-turner avatar Mar 05 '24 19:03 keith-turner

One thing to note is that this currently doesn't validate SPI configuration when the configuration changes at runtime. I think there is code that handles this for non-SPI classes, but not sure where this is. @EdColeman might know.

dlmarion avatar Mar 05 '24 19:03 dlmarion

I removed this code with my changes in #4273 since the config is no longer across multiple properties. But would it make sense to add some sort of on-demand SPI validation utility? https://github.com/apache/accumulo/blob/2.1/server/base/src/main/java/org/apache/accumulo/server/conf/CheckCompactionConfig.java

The CompactionPlanner is an SPI, so DefaultCompactionPlanner could implement the validation interface and it would be validated on startup.

Right, I understand that from these changes. I think that's a great benefit. I was trying to point out that the current compaction SPI validation utility uses a specific file path which allows for validation of proposed property changes without modifying ZK state.

https://github.com/apache/accumulo/blob/d91d0162115ae66112a104278bcd14e8085936d3/server/base/src/main/java/org/apache/accumulo/server/conf/CheckCompactionConfig.java#L96

So instead dealing with multiple failed startups, a user could have faster feedback by populating a config file with the proposed property set and then running ./accumulo spi-validate <path to prop file>.

This allows the user to test a SPI related change without impacting the system.

With the current changes in #4338 , CheckCompactionConfig could be removed in 2.1 in favor of a generic prop validation or spi-validation command which just runs the same validation code as ServerContext.

I also looked at CheckServerConfig but that doesn't take in a filepath as an option. https://github.com/apache/accumulo/blob/d91d0162115ae66112a104278bcd14e8085936d3/server/base/src/main/java/org/apache/accumulo/server/conf/CheckServerConfig.java#L30-L34

ddanielr avatar Mar 05 '24 20:03 ddanielr

If CheckServerConfig to a file that it applied last, as a set property overrides, then I don't think you would need CheckCompactionConfig. It would be a generic check configuration utility.

Update: Actually, SiteConfiguration uses the system property accumulo.properties to load a custom configuration file. If it's not specified, then the file accumulo.properties is found on the classpath. So, someone could copy the existing accumulo.properties to a new location, edit it, then set the system property and run CheckServerConfig.

dlmarion avatar Mar 05 '24 20:03 dlmarion

Kicked off a full IT build

dlmarion avatar Mar 12 '24 12:03 dlmarion

IT build successful

dlmarion avatar Mar 25 '24 18:03 dlmarion

In https://github.com/keith-turner/accumulo/commit/b6ff59cd494977841a5fb491800bd12c124428be I experimented with moving the validation from an SPI interface to internal interfaces. The idea being that custom internal code would be created for each type of plugin in Accumulo that knows how to validate that plugin. Experimented with this approach after noticing the following.

  • Saw multiple places where the new implementations of the SPI interface were calling non-spi code. This indicates that a user implementing these interfaces may need to also call non SPI code.
  • The Accumulo plugins have a lot of individual quirks that makes creating a one size fits all validation API difficult. For example iterators and cyrpto services have scopes, balancers have a unique way of doing per table plugins.

This approach does not require any changes to existing plugins, it just initializes each plugin with config and sees if errors are thrown. This is done using existing SPI calls by internal code.

keith-turner avatar Apr 04 '24 00:04 keith-turner

@keith-turner - I'm fine with closing this PR in favor of one you put up from your branch.

dlmarion avatar Apr 05 '24 14:04 dlmarion

@keith-turner - I'm fine with closing this PR in favor of one you put up from your branch.

Closing this based on this comment.

ctubbsii avatar Jul 26 '24 09:07 ctubbsii