Need a mechanism to exclude fields from the errorIndex
The error index process started in the EventMapper needs a mechanism to exclude fields from being indexed.
@ivakegg I am interested in taking this ticket on. Can you go a bit more into detail about the error index process and where it is started in the EventMapper?
So in the EventMapper.map, there is a try-catch around the processEvent call. In this catch clause, it pulls all of the error datahandlers and runs the events through those. Those datahandlers will populate the error tables.
The error tables are composed of 4 tables that directly parallel the main datawave tables (shard, shardIndex, shardReverseIndex, DatawaveMetadata). They have different table table names of course (something like error_e, error_i, error_ri, and error_m). It should be obvious when you list the tables.
So, what the dealio here is that the ErrorIngestHelper currently overrides the BaseIngestHelper isIndexed, isReverseIndexed, ... methods and it should instead fallback to those methods in the BaseIngestHelper.
In addition, we need to make sure that the error data type handler can be appropriately configured with erro.data.category.index.disallowlist.
Bonus Points / Follow-on ticket: Make error index / disallow list configurable per base datatype. There may be fields we want to include/exclude depending on the original type headed for the error tables.
This would include a new set of configs and updates to handle them. e.g. mycsv.error.data.category.index may need to be different than mywikipedia.error.data.category.index, etc., falling back to error.data.category.index by default if there are no datatype overrides.
I think the bonus work makes more sense as a follow on ticket. I've created #3004 for it.