corrset-benchmark More performance improvements

More performance improvements

Open cjcormier opened this issue 1 year ago • 0 comments

The following performance improvements are in this PR and one to your indexical repo:

When iterating over a SIMD bitset, we can check that the entire SIMD register is non-zero.
- More detail in https://github.com/willcrichton/indexical/pull/2
Separating and transposing the raw scores to allow better lookup/cache behavior
Manual "iteration" of QuestionCombinations to allow for removing of QuestionCombinations cloning
- I don't think this can be done with normal iterators due requiring lending iterators. The manual iteration is a bit messy, but does get some amount of performance.

On a large dataset with k = 5 I get the following improvements:

Change	Runtime (s)	Total Reduction
Base	264.923	00%
Simd Iter Skip 0	202.173	24%
Scores	193.053	27%
Manual Iter	178.496	32%

I have testing another change that replaces the clone and BitOrAssign with a bit or and separate assignment. This removes the need for the clone and reduces the runtime by a further 10s on my machine. This does require coordination between this repo and the indexical one, so it was left off of these current PRs. Let me know if that course of action sounds worth pursuing.

Nov 06 '23 01:11 cjcormier

corrset-benchmark corrset-benchmark copied to clipboard

More performance improvements

corrset-benchmark
corrset-benchmark copied to clipboard