vortex icon indicating copy to clipboard operation
vortex copied to clipboard

add AVX512 support for filtering in place

Open connortsui20 opened this issue 1 month ago โ€ข 11 comments

Reorganizes some of the filter implementations and adds AVX512 filter implementation and benchmarks

connortsui20 avatar Nov 19 '25 14:11 connortsui20

Codecov Report

:x: Patch coverage is 75.83643% with 130 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 85.45%. Comparing base (c75c8a3) to head (eae0511). :warning: Report is 78 commits behind head on develop.

Files with missing lines Patch % Lines
vortex-compute/src/filter/slice/simd_compress.rs 9.57% 85 Missing :warning:
vortex-compute/src/filter/slice/out/by_mask.rs 55.55% 28 Missing :warning:
vortex-compute/src/filter/slice/out/avx512.rs 91.07% 5 Missing :warning:
vortex-compute/src/filter/slice/out/by_bitview.rs 87.50% 5 Missing :warning:
vortex-compute/src/filter/slice/in_place/avx512.rs 92.30% 4 Missing :warning:
vortex-compute/src/filter/slice/out/mod.rs 98.38% 2 Missing :warning:
vortex-compute/src/filter/slice/in_place/mod.rs 99.08% 1 Missing :warning:

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Nov 19 '25 15:11 codecov[bot]

Does GitHub give us CI runners with AVX512 support? My recollection from when I did the AVX2 take kernel is that they don't

a10y avatar Nov 19 '25 15:11 a10y

not really sure how to deal with this, I think sometimes they have it, sometimes not?

connortsui20 avatar Nov 19 '25 16:11 connortsui20

CI runs in AWS, you can get whatever machine you want, right now by default we use m7i/m7a which are sapphire rapids/zen4 which have AVX512. On benchmarks we run c6i which is ice-lake sp which also has avx512

robert3005 avatar Nov 19 '25 16:11 robert3005

Heads up that we can't micro-benchmark avx512 with codspeed, as valgrind/callgrind does not support avx512. Saying you'll get numbers for avx2 if it is implemented:

      - name: Build benchmarks (shard 1)
        env:
          RUSTFLAGS: "-C target-feature=+avx2"
      ...

0ax1 avatar Nov 20 '25 09:11 0ax1

Given that were pivoting to batch I'm going to put a pin in this.

connortsui20 avatar Nov 20 '25 18:11 connortsui20

Nooo I like this!

gatesn avatar Nov 20 '25 19:11 gatesn

CodSpeed Performance Report

Merging #5399 will not alter performance

Comparing ct/avx512-filter (eae0511) with develop (c75c8a3)

Summary

โœ… 1478 untouched
๐Ÿ†• 56 new
โฉ 214 skipped[^skipped]

Benchmarks breakdown

Benchmark BASE HEAD Change
๐Ÿ†• in_place_scalar[(1024, 0.0)] N/A 4.6 ยตs N/A
๐Ÿ†• in_place_scalar[(1024, 0.1)] N/A 6.2 ยตs N/A
๐Ÿ†• in_place_scalar[(1024, 0.25)] N/A 7 ยตs N/A
๐Ÿ†• in_place_scalar[(1024, 0.5)] N/A 7.6 ยตs N/A
๐Ÿ†• in_place_scalar[(1024, 0.75)] N/A 8.2 ยตs N/A
๐Ÿ†• in_place_scalar[(1024, 0.9)] N/A 8.5 ยตs N/A
๐Ÿ†• in_place_scalar[(1024, 1.0)] N/A 8.8 ยตs N/A
๐Ÿ†• in_place_scalar[(131072, 0.0)] N/A 553.9 ยตs N/A
๐Ÿ†• in_place_scalar[(131072, 0.1)] N/A 784.8 ยตs N/A
๐Ÿ†• in_place_scalar[(131072, 0.25)] N/A 866 ยตs N/A
๐Ÿ†• in_place_scalar[(131072, 0.5)] N/A 943.8 ยตs N/A
๐Ÿ†• in_place_scalar[(131072, 0.75)] N/A 1 ms N/A
๐Ÿ†• in_place_scalar[(131072, 0.9)] N/A 1.1 ms N/A
๐Ÿ†• in_place_scalar[(131072, 1.0)] N/A 1.1 ms N/A
๐Ÿ†• in_place_scalar[(16384, 0.0)] N/A 69.5 ยตs N/A
๐Ÿ†• in_place_scalar[(16384, 0.1)] N/A 97.8 ยตs N/A
๐Ÿ†• in_place_scalar[(16384, 0.25)] N/A 108.1 ยตs N/A
๐Ÿ†• in_place_scalar[(16384, 0.5)] N/A 117.7 ยตs N/A
๐Ÿ†• in_place_scalar[(16384, 0.75)] N/A 126.8 ยตs N/A
๐Ÿ†• in_place_scalar[(16384, 0.9)] N/A 132.2 ยตs N/A
... ... ... ... ...

:information_source: Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks. [^skipped]: 214 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

codspeed-hq[bot] avatar Nov 24 '25 15:11 codspeed-hq[bot]

@0ax1 is there a way for me to disable the benchmark just for codspeed?

connortsui20 avatar Nov 24 '25 17:11 connortsui20

@0ax1 is there a way for me to disable the benchmark just for codspeed?

~~You shouldn't need to. We compile for AVX2, so the AVX512 version shouldn't be picked up.~~

~~Ah I see that we're running into failed to execute the benchmark process, exit code: 132 which I assume is picked based on runtime capabilities of the CPU?~~

Discussed to exclude the benchmark via: #[cfg(not(codspeed))]

0ax1 avatar Nov 24 '25 18:11 0ax1

hmm it seems I've done something wrong here then: https://github.com/vortex-data/vortex/actions/runs/19644046057/job/56254751182?pr=5399

connortsui20 avatar Nov 24 '25 18:11 connortsui20

I'm going to close this until we start doing e2e tests and we can actually measure how much faster things are.

connortsui20 avatar Dec 10 '25 14:12 connortsui20