datawave icon indicating copy to clipboard operation
datawave copied to clipboard

provide method for managing the creation of shards

Open bmwmaestoso opened this issue 6 years ago • 1 comments

A process is need to manage the creation for future shards. It needs to function with the following configuration:

  • name of table for sharding
  • number of shards
  • number of future days to create shards
  • create missing shards during initialization
  • minimum time window to create shards for next day

The shard manager needs to be able to create missing shards during startup. The shards will need to be monitored and created based upon the configuration parameters.

bmwmaestoso avatar Jan 08 '19 16:01 bmwmaestoso

This class would most likely live under the datawave-opts-tools-parent module as a new sub-module. This code would replace the functionality current handed by the create-shards-since.sh and the create-tomorrows-shards.sh scripts under the datawave-ingest-scripts module. Most likely we will change those scripts to use the newly created program.

The main functionality that this is adding is the ability to create missing shard splits for the past in addition to creating shard splits for a configured number of days in the future.

The reason we need this is when a system is brought back up we may have missed creating shard splits for days past. So in addition to configuring the number of days in the future to create shard splits, it should also be configured with the number of days to look at in the past.

NOTE: blasting splits at the system can have significant performance impacts, so I would like this program to also be configured such that it does not create splits too quickly. Perhaps a max number of splits per minute property.

ivakegg avatar May 21 '21 14:05 ivakegg