corrset-benchmark icon indicating copy to clipboard operation
corrset-benchmark copied to clipboard

More performance improvements

Open cjcormier opened this issue 1 year ago • 0 comments

The following performance improvements are in this PR and one to your indexical repo:

  • When iterating over a SIMD bitset, we can check that the entire SIMD register is non-zero.
    • More detail in https://github.com/willcrichton/indexical/pull/2
  • Separating and transposing the raw scores to allow better lookup/cache behavior
  • Manual "iteration" of QuestionCombinations to allow for removing of QuestionCombinations cloning
    • I don't think this can be done with normal iterators due requiring lending iterators. The manual iteration is a bit messy, but does get some amount of performance.

On a large dataset with k = 5 I get the following improvements:

Change Runtime (s) Total Reduction
Base 264.923 00%
Simd Iter Skip 0 202.173 24%
Scores 193.053 27%
Manual Iter 178.496 32%

I have testing another change that replaces the clone and BitOrAssign with a bit or and separate assignment. This removes the need for the clone and reduces the runtime by a further 10s on my machine. This does require coordination between this repo and the indexical one, so it was left off of these current PRs. Let me know if that course of action sounds worth pursuing.

cjcormier avatar Nov 06 '23 01:11 cjcormier