datawave
datawave copied to clipboard
A wildcard query term with >500K hits can yield inconsistent results over subsequent runs
This issue seems to surface when event.query.max.results is greater than 500K, and I believe we've narrowed this down to DedupingIterator's BloomFilter, which is currently defaulted to 500K expected insertions.
As part of #785, I added a constructor to DedupingIterator that allows expectedInsertions and fpp args to be passed in, so it may suffice to have those be configurable on the query logic