corrset-benchmark
corrset-benchmark copied to clipboard
More performance improvements
The following performance improvements are in this PR and one to your indexical repo:
- When iterating over a SIMD bitset, we can check that the entire SIMD register is non-zero.
- More detail in https://github.com/willcrichton/indexical/pull/2
- Separating and transposing the raw scores to allow better lookup/cache behavior
- Manual "iteration" of QuestionCombinations to allow for removing of QuestionCombinations cloning
- I don't think this can be done with normal iterators due requiring lending iterators. The manual iteration is a bit messy, but does get some amount of performance.
On a large dataset with k = 5 I get the following improvements:
Change | Runtime (s) | Total Reduction |
---|---|---|
Base | 264.923 | 00% |
Simd Iter Skip 0 | 202.173 | 24% |
Scores | 193.053 | 27% |
Manual Iter | 178.496 | 32% |
I have testing another change that replaces the clone and BitOrAssign with a bit or and separate assignment. This removes the need for the clone and reduces the runtime by a further 10s on my machine. This does require coordination between this repo and the indexical one, so it was left off of these current PRs. Let me know if that course of action sounds worth pursuing.