Michael McCandless

Results 354 comments of Michael McCandless

I've made some progress here ... I added an "actual QPS" measure, recorded by `SearchPerfTest` and reported for each iteration of each competitor. It discards warmup time (currently hardwired to...

I ran this same `perf.py` on beast3 which has many cores (64 actual, 128 with HT). Curiously QPS also doesn't improve so much when I double `searchConcurrency` from 16 to...

> (More generally, I wish the facet module behaved a bit more like a regular Lucene `Collector`, instead of first loading all hits into a bitset and doing the work...

I attempted to follow the `README` instructions to generate nightly benchy vectors, using this command: ``` python3 -u src/python/infer_token_vectors_cohere.py ../data/cohere-wikipedia-768.vec 27625038 ../data/cohere-wikipedia-queries-768.vec 10000 ``` (Note that the nightly benchy only...

Oooh this [`load_datasets` method](https://huggingface.co/docs/datasets/v2.18.0/en/package_reference/loading_methods#datasets.load_dataset) takes a parameter `keep_in_memory`! I'll poke around.

OK well that `keep_in_memory=False` parameter seemed to do nothing -- still OOME killer at 256 GB RAM. With [this change to do chunking into 1M blocks of vectors when writing...

Hmm, except, that file is too large? ``` beast3:util.nightly[master]$ python3 Python 3.11.7 (main, Jan 29 2024, 16:03:57) [GCC 13.2.1 20230801] on linux Type "help", "copyright", "credits" or "license" for more...

OK I think these are `float64` typed vectors, in which case the file size makes sense. But I think nightly benchy wants `float32`?

And I think `knnPerfTest.py/KnnGraphTester.java` also wants `float32`? I'm confused how they are working now on the generated file ...

Oooh [this `Dataset.cast` method](https://discuss.huggingface.co/t/whats-the-best-way-to-change-convert-column-type-in-dataset/10711/3) looks promising! I'll explore...