datawave icon indicating copy to clipboard operation
datawave copied to clipboard

A wildcard query term with >500K hits can yield inconsistent results over subsequent runs

Open keith-ratcliffe opened this issue 5 years ago • 0 comments

This issue seems to surface when event.query.max.results is greater than 500K, and I believe we've narrowed this down to DedupingIterator's BloomFilter, which is currently defaulted to 500K expected insertions.

As part of #785, I added a constructor to DedupingIterator that allows expectedInsertions and fpp args to be passed in, so it may suffice to have those be configurable on the query logic

keith-ratcliffe avatar Sep 09 '20 21:09 keith-ratcliffe