datawave icon indicating copy to clipboard operation
datawave copied to clipboard

Feature/dynamic shards

Open lbschanno opened this issue 1 year ago • 1 comments

Add the ability for define custom ShardIdGenerators that can be added to ShardIdFactory and used to override shard id generation for any records considered applicable for the generators.

Closes https://github.com/NationalSecurityAgency/datawave/issues/2632

lbschanno avatar Jan 15 '25 16:01 lbschanno

Tested this via quickstart. Adding the following to warehouse/ingest-configuration/src/main/resources/config/shard-ingest-config.xml:

<property>
    <name>shardIdFactory.generator.1</name>
    <value>datawave.ingest.mapreduce.handler.shard.ShiftOnDay</value>
</property>
<property>
    <name>shardIdFactory.generator.1.datatypes</name>
    <value>myjson</value>
</property>
<property>
    <name>shardIdFactory.generator.1.begin</name>
    <value>20060101</value>
</property>
<property>
    <name>shardIdFactory.generator.1.end</name>
    <value>20130331</value>
</property>

Results in the following shards in the shard table after installing and initializing datawave:

20070924_1 tvmaze\x00-r5sogy.30uc2y.a9zr4g:ORIG_FILE\x00tvmaze-api.json|5|0 [PRIVATE|(BAR&FOO)]
20070924_1 tvmaze\x00-r5sogy.30uc2y.a9zr4g:PREMIERED\x002007-09-24 [PRIVATE|(BAR&FOO)]
20070924_1 tvmaze\x00-r5sogy.30uc2y.a9zr4g:RATING_AVERAGE.RATING_0.AVERAGE_0\x008.2 [PRIVATE|(BAR&FOO)]
20070924_1 tvmaze\x00-r5sogy.30uc2y.a9zr4g:RUNTIME\x0030 [PRIVATE|(BAR&FOO)]
20070924_1 tvmaze\x00-r5sogy.30uc2y.a9zr4g:SCHEDULE_DAYS.SCHEDULE_0.DAYS_0\x00Thursday [PRIVATE|(BAR&FOO)]
20070924_1 tvmaze\x00-r5sogy.30uc2y.a9zr4g:SCHEDULE_TIME.SCHEDULE_0.TIME_0\x0020:00 [PRIVATE|(BAR&FOO)]

Compared to the original:

20070924_0 tvmaze\x00-r5sogy.30uc2y.a9zr4g:ORIG_FILE\x00tvmaze-api.json|5|0 [PRIVATE|(BAR&FOO)]
20070924_0 tvmaze\x00-r5sogy.30uc2y.a9zr4g:PREMIERED\x002007-09-24 [PRIVATE|(BAR&FOO)]
20070924_0 tvmaze\x00-r5sogy.30uc2y.a9zr4g:RATING_AVERAGE.RATING_0.AVERAGE_0\x008.2 [PRIVATE|(BAR&FOO)]
20070924_0 tvmaze\x00-r5sogy.30uc2y.a9zr4g:RUNTIME\x0030 [PRIVATE|(BAR&FOO)]
20070924_0 tvmaze\x00-r5sogy.30uc2y.a9zr4g:SCHEDULE_DAYS.SCHEDULE_0.DAYS_0\x00Thursday [PRIVATE|(BAR&FOO)]
20070924_0 tvmaze\x00-r5sogy.30uc2y.a9zr4g:SCHEDULE_TIME.SCHEDULE_0.TIME_0\x0020:00 [PRIVATE|(BAR&FOO)]

Note that the value of num.shards was set to 1.

lbschanno avatar Jan 28 '25 07:01 lbschanno