datawave icon indicating copy to clipboard operation
datawave copied to clipboard

Need a mechanism to exclude fields from the errorIndex

Open ivakegg opened this issue 9 months ago • 4 comments

The error index process started in the EventMapper needs a mechanism to exclude fields from being indexed.

ivakegg avatar Apr 08 '25 15:04 ivakegg

@ivakegg I am interested in taking this ticket on. Can you go a bit more into detail about the error index process and where it is started in the EventMapper?

lbschanno avatar Apr 15 '25 14:04 lbschanno

So in the EventMapper.map, there is a try-catch around the processEvent call. In this catch clause, it pulls all of the error datahandlers and runs the events through those. Those datahandlers will populate the error tables.

The error tables are composed of 4 tables that directly parallel the main datawave tables (shard, shardIndex, shardReverseIndex, DatawaveMetadata). They have different table table names of course (something like error_e, error_i, error_ri, and error_m). It should be obvious when you list the tables.

So, what the dealio here is that the ErrorIngestHelper currently overrides the BaseIngestHelper isIndexed, isReverseIndexed, ... methods and it should instead fallback to those methods in the BaseIngestHelper.

In addition, we need to make sure that the error data type handler can be appropriately configured with erro.data.category.index.disallowlist.

ivakegg avatar Apr 16 '25 12:04 ivakegg

Bonus Points / Follow-on ticket: Make error index / disallow list configurable per base datatype. There may be fields we want to include/exclude depending on the original type headed for the error tables.

This would include a new set of configs and updates to handle them. e.g. mycsv.error.data.category.index may need to be different than mywikipedia.error.data.category.index, etc., falling back to error.data.category.index by default if there are no datatype overrides.

hlgp avatar Jun 27 '25 14:06 hlgp

I think the bonus work makes more sense as a follow on ticket. I've created #3004 for it.

lbschanno avatar Jun 27 '25 21:06 lbschanno