swift-noise icon indicating copy to clipboard operation
swift-noise copied to clipboard

replacing tuples with SIMD - DONT MERGE - seems to be ~40+% SLOWER

Open heckj opened this issue 1 year ago • 8 comments

Since we were talking about this, I took the time to set it up - but after all the conversions, it turns out thats only HURT performance (based on benchmark comparison).

swift package benchmark baseline compare bdb4ef08 --format markdown:

Comparing results between 'bdb4ef08' and 'Current_run'

Host 'Sparrow.local' with 8 'arm64' processors with 16 GB memory, running:
Darwin Kernel Version 23.5.0: Wed May  1 20:16:51 PDT 2024; root:xnu-10063.121.3~5/RELEASE_ARM64_T8103

ExternalBenchmarks

cell2d metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (ns) * p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 458 583 583 584 625 625 38292 1048576
Current_run 708 792 792 833 834 875 54416 923477
Δ 250 209 209 249 209 250 16124 -125099
Improvement % -55 -36 -36 -43 -33 -40 -42 -125099

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K) p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 2183 1716 1716 1713 1601 1601 26 1048576
Current_run 1412 1264 1264 1201 1199 1144 18 923477
Δ -771 -452 -452 -512 -402 -457 -8 -125099
Improvement % -35 -26 -26 -30 -25 -29 -31 -125099

cell3d metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (ns) * p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 458 542 583 583 584 625 32667 1048576
Current_run 708 792 792 833 834 875 51083 922510
Δ 250 250 209 250 250 250 18416 -126066
Improvement % -55 -46 -36 -43 -43 -40 -56 -126066

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K) p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 2183 1845 1716 1716 1713 1601 31 1048576
Current_run 1412 1264 1264 1201 1199 1144 20 922510
Δ -771 -581 -452 -515 -514 -457 -11 -126066
Improvement % -35 -31 -26 -30 -30 -29 -35 -126066

cell_tiling3d metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (ns) * p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 458 583 583 584 625 666 66667 1048576
Current_run 708 792 792 833 834 875 51625 916598
Δ 250 209 209 249 209 209 -15042 -131978
Improvement % -55 -36 -36 -43 -33 -31 23 -131978

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K) p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 2183 1716 1716 1713 1601 1502 15 1048576
Current_run 1412 1264 1264 1201 1199 1144 19 916598
Δ -771 -452 -452 -512 -402 -358 4 -131978
Improvement % -35 -26 -26 -30 -25 -24 27 -131978

classic3d metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (μs) * p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 6750 6875 6919 7127 7211 7543 72959 138634
Current_run 9875 10047 10087 10087 10127 10295 60750 95852
Δ 3125 3172 3168 2960 2916 2752 -12209 -42782
Improvement % -46 -46 -46 -42 -40 -36 17 -42782

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K) p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 148 146 145 140 139 133 14 138634
Current_run 101 100 99 99 99 97 16 95852
Δ -47 -46 -46 -41 -40 -36 2 -42782
Improvement % -32 -32 -32 -29 -29 -27 14 -42782

classic_tiling3d metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (ns) * p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 708 792 833 833 834 916 35958 1009758
Current_run 1083 1208 1208 1209 1250 1292 47791 669209
Δ 375 416 375 376 416 376 11833 -340549
Improvement % -53 -53 -45 -45 -50 -41 -33 -340549

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K) p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 1412 1264 1201 1201 1199 1093 28 1009758
Current_run 923 828 828 827 800 774 21 669209
Δ -489 -436 -373 -374 -399 -319 -7 -340549
Improvement % -35 -34 -31 -31 -33 -29 -25 -340549

classic_tiling_fbm3d metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (μs) * p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 6833 6959 7003 7003 7087 7583 47125 138651
Current_run 9958 10127 10167 10167 10215 10335 57084 95148
Δ 3125 3168 3164 3164 3128 2752 9959 -43503
Improvement % -46 -46 -45 -45 -44 -36 -21 -43503

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K) p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 146 144 143 143 141 132 21 138651
Current_run 100 99 98 98 98 97 18 95148
Δ -46 -45 -45 -45 -43 -35 -3 -43503
Improvement % -32 -31 -31 -31 -30 -27 -14 -43503

disk2d metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (ms) * p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 9314 9339 9347 9388 9486 9609 9630 107
Current_run 24508 24576 24707 24969 30228 43058 43058 39
Δ 15194 15237 15360 15581 20742 33449 33428 -68
Improvement % -163 -163 -164 -166 -219 -348 -347 -68

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (#) p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 107 107 107 107 105 104 104 107
Current_run 41 41 40 40 33 23 23 39
Δ -66 -66 -67 -67 -72 -81 -81 -68
Improvement % -62 -62 -63 -63 -69 -78 -78 -68

gradient2d metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (μs) * p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 6750 6875 6919 6959 7003 7503 42417 140146
Current_run 9916 10047 10087 10127 10167 10295 96958 95770
Δ 3166 3172 3168 3168 3164 2792 54541 -44376
Improvement % -47 -46 -46 -46 -45 -37 -129 -44376

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K) p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 148 146 145 144 143 133 24 140146
Current_run 101 100 99 99 98 97 10 95770
Δ -47 -46 -46 -45 -45 -36 -14 -44376
Improvement % -32 -32 -32 -31 -31 -27 -58 -44376

gradient3d metrics

Time (wall clock): results within specified thresholds, fold down for details.

Time (wall clock) (μs) * p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 6791 6919 6919 6959 7003 7503 44709 139935
Current_run 9916 10047 10087 10127 10167 10295 75500 95903
Δ 3125 3128 3168 3168 3164 2792 30791 -44032
Improvement % -46 -45 -46 -46 -45 -37 -69 -44032

Throughput (# / s): results within specified thresholds, fold down for details.

Throughput (# / s) (K) p0 p25 p50 p75 p90 p99 p100 Samples
bdb4ef08 147 145 145 144 143 133 22 139935
Current_run 101 100 99 99 98 97 13 95903
Δ -46 -45 -46 -45 -45 -36 -9 -44032
Improvement % -31 -31 -32 -31 -31 -27 -41 -44032

heckj avatar Jun 04 '24 20:06 heckj