milvus [do-not-merge][in progress][Enhancement] Custom bitset and bitsetview prototypes

This is a early alpha stage, unit tests are not integrated yet.

Basically, I've replaced FixedVector<bool> and boost::dynamic_bitset with custom bitset and bitsetview in order to reduce the memory bandwidth & increase performance for the filtering.

This PR is for internal use only.

Current progress (numbers are for GCC 9.5.0 on Ubuntu 22.04 LTS; clang-17 produces better performance numbers): Baseline:

[ RUN      ] CApiTest.AssembeChunkPerfTest
start test
cost: 17903us
[       OK ] CApiTest.AssembeChunkPerfTest (183 ms)

[ RUN      ] Expr.TestMultiLogicalExprsOptimization
cost: 1391us
cost: 5us
cost: 4us
cost: 4us
cost: 6us
cost: 4us
cost: 4us
cost: 4us
cost: 4us
cost: 4us
143
cost: 10us
cost: 8us
cost: 10us
cost: 8us
cost: 8us
cost: 8us
cost: 8us
cost: 8us
cost: 8us
cost: 9us
8
/home/ubuntu/zilliz/milvus4/milvus/internal/core/unittest/test_expr.cpp:1561: Failure
Expected: (cost_op) < (cost_no_op), actual: 143 vs 8
[  FAILED  ] Expr.TestMultiLogicalExprsOptimization (7 ms)
[ RUN      ] Expr.TestExprs
start test
3cost: 889us
start test
10cost: 2us
start test
20cost: 2us
start test
30cost: 2us
start test
50cost: 3us
start test
100cost: 7us
start test
200cost: 16us
[       OK ] Expr.TestExprs (9 ms)

[ RUN      ] Expr.TestUnaryBenchTest
start test type:2
 cost: 124.8us
start test type:3
 cost: 163.1us
start test type:4
 cost: 275.9us
start test type:5
 cost: 590.9us
start test type:10
 cost: 62.7us
start test type:11
 cost: 65.9us
[       OK ] Expr.TestUnaryBenchTest (1153 ms)
[ RUN      ] Expr.TestBinaryRangeBenchTest
start test type:2
 cost: 151.4us
start test type:3
 cost: 198.4us
start test type:4
 cost: 361.9us
start test type:5
 cost: 753.9us
start test type:10
 cost: 64.6us
start test type:11
 cost: 62.2us
[       OK ] Expr.TestBinaryRangeBenchTest (1151 ms)
[ RUN      ] Expr.TestLogicalUnaryBenchTest
start test type:2
 cost: 121.14us
start test type:3
 cost: 156.84us
start test type:4
 cost: 249.76us
start test type:5
 cost: 534.44us
start test type:10
 cost: 82.2us
start test type:11
 cost: 83.52us
[       OK ] Expr.TestLogicalUnaryBenchTest (1202 ms)
[ RUN      ] Expr.TestBinaryLogicalBenchTest
start test type:2
 cost: 80.64us
start test type:3
 cost: 78.22us
start test type:4
 cost: 255.76us
start test type:5
 cost: 532.04us
start test type:10
 cost: 89.26us
start test type:11
 cost: 90us
[       OK ] Expr.TestBinaryLogicalBenchTest (1198 ms)
[ RUN      ] Expr.TestBinaryArithOpEvalRangeBenchExpr
start test type:2
 cost: 401.7us
start test type:3
 cost: 420.96us
start test type:4
 cost: 418.04us
start test type:5
 cost: 470.54us
start test type:10
 cost: 250.32us
start test type:11
 cost: 850.08us
[       OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1273 ms)
[ RUN      ] Expr.TestCompareExprBenchTest
start test type:2
 cost: 162us
start test type:3
 cost: 142us
start test type:4
 cost: 374us
start test type:5
 cost: 674us
start test type:10
 cost: 366us
start test type:11
 cost: 645us
[       OK ] Expr.TestCompareExprBenchTest (1214 ms)
[ RUN      ] Expr.TestRefactorExprs
start test
3cost: 1253us
start test
10cost: 1060us
start test
20cost: 681us
start test
30cost: 522us
start test
50cost: 511us
start test
100cost: 506us
start test
200cost: 497us
[       OK ] Expr.TestRefactorExprs (1142 ms)

Candidate:

[ RUN      ] CApiTest.AssembeChunkPerfTest
start test
cost: 6099us
[       OK ] CApiTest.AssembeChunkPerfTest (153 ms)

[ RUN      ] Expr.TestMultiLogicalExprsOptimization
cost: 42us
cost: 15us
cost: 15us
cost: 14us
cost: 15us
cost: 15us
cost: 15us
cost: 15us
cost: 15us
cost: 15us
17
cost: 41us
cost: 39us
cost: 33us
cost: 33us
cost: 33us
cost: 33us
cost: 34us
cost: 41us
cost: 34us
cost: 34us
35
[       OK ] Expr.TestMultiLogicalExprsOptimization (6 ms)
[ RUN      ] Expr.TestExprs
start test
3cost: 20us
start test
10cost: 2us
start test
20cost: 2us
start test
30cost: 2us
start test
50cost: 4us
start test
100cost: 8us
start test
200cost: 15us
[       OK ] Expr.TestExprs (8 ms)

[ RUN      ] Expr.TestUnaryBenchTest
start test type:2
 cost: 55.7us
start test type:3
 cost: 79.8us
start test type:4
 cost: 177.6us
start test type:5
 cost: 337.2us
start test type:10
 cost: 16.9us
start test type:11
 cost: 15.7us
[       OK ] Expr.TestUnaryBenchTest (1140 ms)
[ RUN      ] Expr.TestBinaryRangeBenchTest
start test type:2
 cost: 57.1us
start test type:3
 cost: 87us
start test type:4
 cost: 177.5us
start test type:5
 cost: 342.7us
start test type:10
 cost: 17.9us
start test type:11
 cost: 16.7us
[       OK ] Expr.TestBinaryRangeBenchTest (1152 ms)
[ RUN      ] Expr.TestLogicalUnaryBenchTest
start test type:2
 cost: 34.58us
start test type:3
 cost: 68.86us
start test type:4
 cost: 151.38us
start test type:5
 cost: 286.8us
start test type:10
 cost: 16.54us
start test type:11
 cost: 16.7us
[       OK ] Expr.TestLogicalUnaryBenchTest (1165 ms)
[ RUN      ] Expr.TestBinaryLogicalBenchTest
start test type:2
 cost: 20us
start test type:3
 cost: 17.1us
start test type:4
 cost: 154.12us
start test type:5
 cost: 286.1us
start test type:10
 cost: 19.6us
start test type:11
 cost: 19.24us
[       OK ] Expr.TestBinaryLogicalBenchTest (1188 ms)
[ RUN      ] Expr.TestBinaryArithOpEvalRangeBenchExpr
start test type:2
 cost: 125.7us
start test type:3
 cost: 111.34us
start test type:4
 cost: 148.02us
start test type:5
 cost: 306.7us
start test type:10
 cost: 149.3us
start test type:11
 cost: 282.94us
[       OK ] Expr.TestBinaryArithOpEvalRangeBenchExpr (1221 ms)
[ RUN      ] Expr.TestCompareExprBenchTest
start test type:2
 cost: 89us
start test type:3
 cost: 79us
start test type:4
 cost: 323us
start test type:5
 cost: 629us
start test type:10
 cost: 313us
start test type:11
 cost: 591us
[       OK ] Expr.TestCompareExprBenchTest (1228 ms)
[ RUN      ] Expr.TestRefactorExprs
start test
3cost: 874us
start test
10cost: 611us
start test
20cost: 290us
start test
30cost: 294us
start test
50cost: 272us
start test
100cost: 278us
start test
200cost: 279us
[       OK ] Expr.TestRefactorExprs (1149 ms)

Feb 02 '24 01:02 alexanderguzhva

@alexanderguzhva

Invalid PR Title Format Detected

Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:

Title Format: The PR title must begin with one of these prefixes:

feat: for introducing a new feature.
fix: for bug fixes.
enhance: for improvements to existing functionality.
test: for add tests to existing functionality.
doc: for modifying documentation.
auto: for the pull request from bot.

Description Requirement: The PR must include a non-empty description, detailing the changes and their impact.

Required Title Structure:

[Type]: [Description of the PR]

Where Type is one of feat, fix, enhance, test or doc.

Example:

enhance: improve search performance significantly

Please review and update your PR to comply with these guidelines.

Feb 02 '24 01:02 mergify[bot]

/hold

Feb 02 '24 01:02 alexanderguzhva

/assign @liliu-z

Feb 02 '24 01:02 alexanderguzhva

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 02 '24 01:02 mergify[bot]

@alexanderguzhva

Invalid PR Title Format Detected

Your PR submission does not adhere to our required standards. To ensure clarity and consistency, please meet the following criteria:

Title Format: The PR title must begin with one of these prefixes:

feat: for introducing a new feature.
fix: for bug fixes.
enhance: for improvements to existing functionality.
test: for add tests to existing functionality.
doc: for modifying documentation.
auto: for the pull request from bot.

Description Requirement: The PR must include a non-empty description, detailing the changes and their impact.

Required Title Structure:

[Type]: [Description of the PR]

Where Type is one of feat, fix, enhance, test or doc.

Example:

enhance: improve search performance significantly

Please review and update your PR to comply with these guidelines.

Feb 06 '24 01:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 06 '24 01:02 mergify[bot]

@alexanderguzhva E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Feb 06 '24 01:02 mergify[bot]

It looks like a right place to optimize /assign @congqixia for further check

Feb 06 '24 06:02 liliu-z

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 14 '24 16:02 mergify[bot]

@alexanderguzhva E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Feb 14 '24 16:02 mergify[bot]

@alexanderguzhva Thanks for your contribution. Please submit with DCO, see the contributing guide https://github.com/milvus-io/milvus/blob/master/CONTRIBUTING.md#developer-certificate-of-origin-dco.

Feb 14 '24 21:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 14 '24 21:02 mergify[bot]

Codecov Report

Attention: Patch coverage is 85.04931% with 379 lines in your changes are missing coverage. Please review.

Project coverage is 81.04%. Comparing base (3f7c774) to head (5dcecc8). Report is 26 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #30454      +/-   ##
==========================================
- Coverage   81.08%   81.04%   -0.05%     
==========================================
  Files         980      967      -13     
  Lines      143736   139108    -4628     
==========================================
- Hits       116551   112740    -3811     
+ Misses      23294    22621     -673     
+ Partials     3891     3747     -144

Files	Coverage Δ
internal/core/src/bitset/common.h	`100.00% <100.00%> (ø)`
internal/core/src/bitset/detail/bit_wise.h	`100.00% <100.00%> (ø)`
internal/core/src/bitset/detail/ctz.h	`100.00% <100.00%> (ø)`
...l/core/src/bitset/detail/platform/vectorized_ref.h	`100.00% <100.00%> (ø)`
...e/src/bitset/detail/platform/x86/instruction_set.h	`76.47% <ø> (ø)`
internal/core/src/bitset/detail/popcount.h	`100.00% <100.00%> (ø)`
internal/core/src/bitset/detail/proxy.h	`100.00% <100.00%> (ø)`
internal/core/src/common/BitsetView.h	`87.50% <100.00%> (+0.83%)`	:arrow_up:
internal/core/src/common/Types.h	`9.40% <ø> (+0.15%)`	:arrow_up:
internal/core/src/common/Vector.h	`100.00% <100.00%> (ø)`
... and 35 more

... and 296 files with indirect coverage changes

Feb 14 '24 22:02 codecov[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 14 '24 23:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 15 '24 23:02 mergify[bot]

@alexanderguzhva E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Feb 16 '24 00:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 17 '24 14:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 19 '24 01:02 mergify[bot]

@alexanderguzhva E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Feb 19 '24 01:02 mergify[bot]

Did you try cases in VDBBench with filtering? Curios about the potential improvement

Feb 19 '24 09:02 liliu-z

/assign @zhagnlu

Feb 19 '24 09:02 liliu-z

@alexanderguzhva E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Feb 20 '24 21:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 20 '24 21:02 mergify[bot]

@alexanderguzhva E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Feb 21 '24 01:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 21 '24 02:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 21 '24 03:02 mergify[bot]

@alexanderguzhva E2e jenkins job failed, comment /run-cpu-e2e can trigger the job again.

Feb 21 '24 18:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 21 '24 19:02 mergify[bot]

@alexanderguzhva ut workflow job failed, comment rerun ut can trigger the job again.

Feb 22 '24 02:02 mergify[bot]

from ut test above, this is good job, please rebase newest master branch code

Feb 22 '24 08:02 zhagnlu

milvus milvus copied to clipboard

[do-not-merge][in progress][Enhancement] Custom bitset and bitsetview prototypes

Codecov Report

milvus
milvus copied to clipboard