luceneutil Nightly tasks should never have more than 5 queries in each category

@jpountz noticed that some of the taxo facets nightly tasks jumped surprisingly when we added the new count(*) tasks.

Digging, I realized that the nightly benchmarks randomness had shifted when we added the new tasks (since the benchy shuffles the incoming tasks then picks top N (= 5 for the nightly benchy in particular)), and because some task categories in tasks/wikinightly.tasks have more than 5 unique queries.

This is a sneaky longstanding bug, leading us to believe there were performance changes when in fact the specific queries being executed had changed, making results incomparable. This has likely affected us a number of times in the past, drawing false conclusions.

I plan to:

Fix nightlyBench.py to check that no more than 5 unique queries are present under each task
Make a one-time change to wikinightly.tasks to try to "get back" to the specific tasks we had executed previous, to try to undo the false performance change.

Thank you @jpountz for noticing the original WTF!

Aug 03 '23 10:08 mikemccand

Well, it's not only the taxo facets tasks that are subject to random-seed-shift risks:

Traceback (most recent call last):
  File "/l/util.nightly/src/python/nightlyBench.py", line 1857, in <module>
    validate_nightly_task_count(f'{constants.BENCH_BASE_DIR}/tasks/wikinightly.tasks', COUNTS_PER_CAT)
  File "/l/util.nightly/src/python/nightlyBench.py", line 258, in validate_nightly_task_count
    raise RuntimeError(f'nightly tasks file {tasks_file} must have at most {max_count} tasks in each category, but saw {len(tasks)}:\n  {tasks_str}')
RuntimeError: nightly tasks file /l/util.nightly/tasks/wikinightly.tasks must have at most 5 tasks in each category, but saw 9:
  vector//golf
  vector//publisher backstory
  vector//many foundation
  vector//many geografia
  vector//http
  vector//interviews
  vector//year work
  vector//such 2007
  vector//this school

Worse, when I peeked in the logs to try to pick which 5 vector searches I should pick/disambiguate to going forward, it's not easy to do so since the KnnFloatVectorQuery's toString is just a vector :) And only its first dimension no less:

TASK: cat=VectorSearch q=KnnFloatVectorQuery:vector[0.02625591,...][100] s=null group=null hits=100 facets=[]

For now I'll just disambiguate to the first 5 vector queries from the existing ones:

VectorSearch: vector//publisher backstory # freq=194856 freq=148
VectorSearch: vector//many geografia # freq=99550 freq=104
VectorSearch: vector//many foundation # freq=99550 freq=10894
VectorSearch: vector//this school # freq=238551 freq=29912
VectorSearch: vector//such 2007 # freq=111526 freq=90200 1.2
VectorSearch: vector//year work # freq=175324 freq=102732 1.7
VectorSearch: vector//interviews # freq=31768
VectorSearch: vector//golf # freq=31760
VectorSearch: vector//http # freq=389790

Maybe there is a way to stuff some human readable string into KnnFloatVectorQuery that pops out in its toString() method to help we humans that need to otherwise look only at vectors? @msokolov?

Aug 03 '23 10:08 mikemccand

OK, besides VectorSearch category, only the taxo facets category was also unstable/non-deterministic:

    OrHighMedDayTaxoFacets
    AndHighMedDayTaxoFacets
    AndHighHighDayTaxoFacets
    MedTermDayTaxoFacets

I was able to disambiguate these tasks to their pre-2023/07/28 tasks.

I'll kick off another one-off nightly benchy. Let's see if these taxo facet tasks get back to "normal" ish.

Aug 03 '23 10:08 mikemccand

Worse, when I peeked in the logs to try to pick which 5 vector searches I should pick/disambiguate to going forward, it's not easy to do so since the KnnFloatVectorQuery's toString is just a vector :) And only its first dimension no less

I'll open a Lucene issue for this ... we humans still need to be able to read these things :)

Aug 03 '23 10:08 mikemccand

luceneutil luceneutil copied to clipboard

Nightly tasks should never have more than 5 queries in each category

luceneutil
luceneutil copied to clipboard