unified-memory-framework
unified-memory-framework copied to clipboard
Postpone freeing a tracker entry, add a reference counter
Description
Postpone freeing a tracker entry until it is really removed from the tracker.
Fixes: #1233
Checklist
- [x] Code compiles without errors locally
- [x] All tests pass locally
- [x] CI workflows execute properly
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14473372398
Compute Benchmarks run ( --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14473372398 Job status: failure. Test status: failure.
Summary
(Emphasized values are the best results)
Improved 40 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> | 1796.990000 ns | 4972.830 ns | 176.73% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider | 1174.930000 ns | 2564.380 ns | 118.26% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 318.467000 ns | 647.568 ns | 103.34% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1508.560000 ns | 3025.960 ns | 100.59% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 395.143000 ns | 767.476 ns | 94.23% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 12691.600000 ns | 24625.100 ns | 94.03% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 323.984000 ns | 624.879 ns | 92.87% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1582.200000 ns | 3037.880 ns | 92.00% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 625.572000 ns | 1194.930 ns | 91.01% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1674.610000 ns | 3150.710 ns | 88.15% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc | 450.721000 ns | 838.137 ns | 85.95% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13023.400000 ns | 24169.300 ns | 85.58% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 5798.090000 ns | 10613.300 ns | 83.05% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 417.284000 ns | 762.820 ns | 82.81% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 656.280000 ns | 1197.650 ns | 82.49% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 14085.800000 ns | 24818.100 ns | 76.19% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 313.102000 ns | 539.665 ns | 72.36% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 330.768000 ns | 568.249 ns | 71.80% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 2655.420000 ns | 4540.970 ns | 71.01% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 340.066000 ns | 580.669 ns | 70.75% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 371.110000 ns | 632.053 ns | 70.31% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc | 504.080000 ns | 852.465 ns | 69.11% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc | 222.220000 ns | 365.370 ns | 64.42% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> | 17325.100000 ns | 28297.400 ns | 63.33% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 367.259000 ns | 599.047 ns | 63.11% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 365.715000 ns | 591.936 ns | 61.86% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider | 16056.600000 ns | 25706.600 ns | 60.10% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 210.554000 ns | 334.039 ns | 58.65% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 221.357000 ns | 350.589 ns | 58.38% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 1040.670000 ns | 1637.570 ns | 57.36% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc | 181.914000 ns | 282.643 ns | 55.37% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 23592.200000 ns | 36260.200 ns | 53.70% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 221.747000 ns | 340.160 ns | 53.40% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 355.287000 ns | 543.490 ns | 52.97% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 238.835000 ns | 360.015 ns | 50.74% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 1121.910000 ns | 1670.930 ns | 48.94% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc | 135.488000 ns | 170.436 ns | 25.79% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 159.699000 ns | 199.856 ns | 25.15% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 172.138000 ns | 204.384 ns | 18.73% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc | 149.274000 ns | 172.393 ns | 15.49% |
Regressed 6 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 4677.440 ns | 752.146000 ns | -83.92% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 196.851 ns | 88.994000 ns | -54.79% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 189.812 ns | 88.265900 ns | -53.50% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 101.247 ns | 64.327600 ns | -36.46% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 95.715 ns | 63.410600 ns | -33.75% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 35133.300 ns | 30418.400000 ns | -13.42% |
Performance change in benchmark groups
UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 323.984000 ns | 624.879 ns | 92.87% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1582.200000 ns | 3037.880 ns | 92.00% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 313.102000 ns | 539.665 ns | 72.36% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 1040.670000 ns | 1637.570 ns | 57.36% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc | 181.914000 ns | 282.643 ns | 55.37% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 189.812 ns | 88.265900 ns | -53.50% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 4677.440 ns | 752.146000 ns | -83.92% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 12691.600000 ns | 24625.100 ns | 94.03% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 371.110000 ns | 632.053 ns | 70.31% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc | 222.220000 ns | 365.370 ns | 64.42% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 355.287000 ns | 543.490 ns | 52.97% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 1121.910000 ns | 1670.930 ns | 48.94% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 35133.300 ns | 30418.400000 ns | -13.42% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 196.851 ns | 88.994000 ns | -54.79% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 625.572000 ns | 1194.930 ns | 91.01% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1674.610000 ns | 3150.710 ns | 88.15% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc | 450.721000 ns | 838.137 ns | 85.95% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 330.768000 ns | 568.249 ns | 71.80% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 2655.420000 ns | 4540.970 ns | 71.01% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 340.066000 ns | 580.669 ns | 70.75% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 159.699000 ns | 199.856 ns | 25.15% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 656.280000 ns | 1197.650 ns | 82.49% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 14085.800000 ns | 24818.100 ns | 76.19% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc | 504.080000 ns | 852.465 ns | 69.11% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 367.259000 ns | 599.047 ns | 63.11% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 365.715000 ns | 591.936 ns | 61.86% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 23592.200000 ns | 36260.200 ns | 53.70% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 172.138000 ns | 204.384 ns | 18.73% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 318.467000 ns | 647.568 ns | 103.34% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1508.560000 ns | 3025.960 ns | 100.59% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 395.143000 ns | 767.476 ns | 94.23% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 210.554000 ns | 334.039 ns | 58.65% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 221.747000 ns | 340.160 ns | 53.40% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc | 135.488000 ns | 170.436 ns | 25.79% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 95.715 ns | 63.410600 ns | -33.75% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13023.400000 ns | 24169.300 ns | 85.58% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 5798.090000 ns | 10613.300 ns | 83.05% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 417.284000 ns | 762.820 ns | 82.81% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 221.357000 ns | 350.589 ns | 58.38% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 238.835000 ns | 360.015 ns | 50.74% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc | 149.274000 ns | 172.393 ns | 15.49% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 101.247 ns | 64.327600 ns | -36.46% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> | 1796.990000 ns | 4972.830 ns | 176.73% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider | 1174.930000 ns | 2564.380 ns | 118.26% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> | 17325.100000 ns | 28297.400 ns | 63.33% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider | 16056.600000 ns | 25706.600 ns | 60.10% |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> | 0.000000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider | 0.000000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> | 0.000000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider | 0.000000 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 0.064949 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 30.607300 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 60.801600 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 0.016245 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 30.589100 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 55.478400 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 26.002600 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 64.508100 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 62.411100 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 25.507300 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 60.786800 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 54.893400 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 23.636700 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 85.085300 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 85.085300 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 20.038400 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 84.818400 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 88.068200 % | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc | 185.427000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 4683.950000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 1012.640000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 317.499000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1599.570000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 193.304000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 326.985000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc | 230.601000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 35481.000000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 1152.200000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 387.822000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 12994.000000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 204.267000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 389.277000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc | 450.370000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 2627.180000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 622.544000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 329.937000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1645.940000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 158.885000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 326.537000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc | 508.403000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 24059.100000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 669.748000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 382.320000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13026.500000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 167.881000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 360.961000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc | 136.707000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 325.845000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 388.318000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 217.319000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1510.160000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 92.599300 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 206.031000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc | 146.831000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 6542.910000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 413.041000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 227.619000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12405.600000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 104.040000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 223.409000 ns | - |
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 50.000000 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 99.990400 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 99.960900 % | - |
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 20.000000 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 99.990300 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 99.962800 % | - |
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 97.184200 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 99.992500 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 99.989800 % | - |
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 90.346000 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 99.991800 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 99.987700 % | - |
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 99.536100 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 99.996800 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 99.992800 % | - |
Relative perf in group FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 98.763000 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 99.996800 % | - | |
| FRAGMENTATION_peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 99.994200 % | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 (6)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 glibc | - | 366.197000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 jemalloc_pool<os_provider> | - | 1672.840000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 scalable_pool<os_provider> | - | 549.361000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 umfProxy | - | 53569.500000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 jemalloc | - | 89.618300 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:8 tbbProxy | - | 632.770000 ns |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 (6)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 glibc | - | 366.670000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 jemalloc_pool<os_provider> | - | 1674.080000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 scalable_pool<os_provider> | - | 544.330000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 umfProxy | - | 78951.900000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 jemalloc | - | 89.786300 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:12 tbbProxy | - | 638.970000 ns |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 (6)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 glibc | - | 845.858000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 jemalloc_pool<os_provider> | - | 1198.180000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 scalable_pool<os_provider> | - | 597.241000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 umfProxy | - | 53934.300000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 jemalloc | - | 204.593000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:8 tbbProxy | - | 584.623000 ns |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 (6)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 glibc | - | 855.042000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 jemalloc_pool<os_provider> | - | 1202.610000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 scalable_pool<os_provider> | - | 603.377000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 umfProxy | - | 79284.100000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 jemalloc | - | 203.389000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:12 tbbProxy | - | 594.468000 ns |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 (6)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 glibc | - | 173.740000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 jemalloc_pool<os_provider> | - | 764.419000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 scalable_pool<os_provider> | - | 362.542000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 umfProxy | - | 53422.200000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 jemalloc | - | 64.340700 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:8 tbbProxy | - | 352.649000 ns |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 (6)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 glibc | - | 173.294000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 jemalloc_pool<os_provider> | - | 763.153000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 scalable_pool<os_provider> | - | 359.356000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 umfProxy | - | 78621.600000 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 jemalloc | - | 64.383900 ns | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:12 tbbProxy | - | 354.263000 ns |
Details
Benchmark details contain too many chars to display
"Exception: The directory /home/test-user/bench_workdir_umf exists but is not a benchmark work directory." See: https://github.com/oneapi-src/unified-memory-framework/actions/runs/14473372398/job/40592824780 @lplewa @lukaszstolarczuk Could you fix it?
The last commit https://github.com/oneapi-src/unified-memory-framework/pull/1270/commits/accde787a6132aef25f0116d914710fb0f446cdc is not finished yet !
What is the purpose of this PR, could you please describe the scenario and how these changes help?
What is the purpose of this PR, could you please describe the scenario and how these changes help?
We have a race, between insert and delete of entry in tracker
Top screen in thread one, bottom screen is thread two. Numbers shows order of the operations.
But how is it possible that one thread has some pointer which belongs to the entry in the memory tracker and another thread removes that entry from the tracker? The first thing that came to my mind is the following:
- T1 allocates memory, and the corresponding region is put into the tracker
- T2 frees the memory allocated by T1, and the corresponding region is removed from the tracker
- T1 is still trying to use the memory which is already been freed by T2.
But it is an ill-formed client's code. What is the real scenario?
Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.
Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.
That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R.
Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.
That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R.
@vinser52 @lplewa The explanation is that the pointer P (of T1) does not belong to the region R (of T2).
This is the real scenario:
a) thread T2 allocates memory region R (for example: 0x100-0x300 - address 0x100 and size 0x200) and adds it to the tracker (this is the only entry in the tracker yet) and gets preempted ...
b) thread T1 allocates memory of size 0x500 and receives pointer P (address 0x400, size 0x500 - range 0x400-0x900) and tries to add it to the tracker, so it calls umfMemoryTrackerAdd(), so critnib_find(FIND_LE) finds and returns the only region R existing in the tracker R: 0x100-0x300 (step 1 on the picture) and T1 gets preempted (step 2 on the picture) ...
c) thread T2 removes the region R from the tracker (step 3 on the picture), frees its memory (step 4 on the picture) and T2 gets preempted ...
d) thread T1 tries to read a size of the already freed region R (step 5 on the picture) and it crashes ...
Pool nesting detection. We are checking if there is region in the tracker that contains region that we are adding. So we are looking for region with address lower then our - we are finding one, and in the exact moment other thread removes this region which we found.
That is exactly my question. How such a scenario might happen: if T1 has a pointer P, that belongs to some region R, and T2 removes this region R.
@vinser52 @lplewa The explanation is that the pointer P (of T1) does not belong to the region R (of T2).
This is the real scenario: a) thread T2 allocates memory region R (for example:
0x100-0x300- address0x100and size0x200) and adds it to the tracker (this is the only entry in the tracker yet) and gets preempted ... b) thread T1 allocates memory of size0x500and receives pointer P (address0x400, size0x500- range0x400-0x900) and tries to add it to the tracker, so it callsumfMemoryTrackerAdd(), socritnib_find(FIND_LE)finds and returns the only region R existing in the tracker R:0x100-0x300(step 1 on the picture) and gets preempted (step 2 on the picture) ... c) thread T2 removes the region R from the tracker (step 3 on the picture), frees its memory (step 4 on the picture) and gets preempted ... d) thread T1 tries to read a size of the already freed region R (step 5 on the picture) and crashes ...
Thank you, @ldorau, for the clear description of the scenario. That makes sense.
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14884274180
@bratpiorka please review
Compute Benchmarks run ( --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14884274180 Job status: success. Test status: success.
Summary
(Emphasized values are the best results)
Improved 1 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1500.100000 ns | 1537.380 ns | 2.49% |
Regressed 7 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1814.700 ns | 1679.220000 ns | -7.47% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 177.683 ns | 168.832000 ns | -4.98% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1649.520 ns | 1594.260000 ns | -3.35% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 399.711 ns | 388.818000 ns | -2.73% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12876.300 ns | 12587.700000 ns | -2.24% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1513.270 ns | 1480.870000 ns | -2.14% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1705.980 ns | 1670.570000 ns | -2.08% |
Performance change in benchmark groups
UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 323.754000 ns | 327.146 ns | 1.05% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 190.542 ns | 190.465000 ns | -0.04% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1649.520 ns | 1594.260000 ns | -3.35% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc | 181.829000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 4769.410000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 984.673000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 317.943000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 208.034 ns | 207.454000 ns | -0.28% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 377.243 ns | 374.876000 ns | -0.63% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13033.400 ns | 12772.800000 ns | -2.00% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc | 222.408000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 36457.500000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 1109.950000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 354.943000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 330.444000 ns | 331.888 ns | 0.44% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 163.403000 ns | 163.478 ns | 0.05% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1705.980 ns | 1670.570000 ns | -2.08% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc | 454.668000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 2688.050000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 623.559000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 340.926000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 364.946000 ns | 366.718 ns | 0.49% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13602.700 ns | 13415.100000 ns | -1.38% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 177.683 ns | 168.832000 ns | -4.98% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc | 503.782000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 24070.400000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 652.252000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 376.140000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1500.100000 ns | 1537.380 ns | 2.49% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 210.666 ns | 210.440000 ns | -0.11% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 95.179 ns | 94.697800 ns | -0.51% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc | 137.044000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 329.170000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 395.680000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 222.466000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12823.500000 ns | 13057.400 ns | 1.82% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 225.882000 ns | 229.769 ns | 1.72% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 97.262600 ns | 98.142 ns | 0.90% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc | 147.664000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 6758.030000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 421.217000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 242.043000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> | 17633.500000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider | 15993.500000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> | 1958.240000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider | 1185.460000 ns | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> | 0.000000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider | 0.000000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> | 0.000000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider | 0.000000 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 0.064949 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 30.628200 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 60.801600 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 0.016245 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 30.604700 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 53.766000 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 26.002600 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 65.724900 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 62.411100 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 25.356900 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 61.166000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 57.041300 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 23.636700 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 85.085300 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 85.085300 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 20.247200 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 84.997400 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 88.068200 % | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 193.848000 ns | 194.068 ns | 0.11% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1605.900000 ns | 1605.960 ns | 0.00% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 325.046 ns | 323.578000 ns | -0.45% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc | 185.311000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 4675.000000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 1055.190000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 317.534000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13291.600000 ns | 13500.500 ns | 1.57% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 204.617 ns | 202.321000 ns | -1.12% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 399.711 ns | 388.818000 ns | -2.73% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc | 230.260000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 35667.000000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 1152.270000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 387.022000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 159.105000 ns | 160.079 ns | 0.61% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 326.705 ns | 322.608000 ns | -1.25% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1814.700 ns | 1679.220000 ns | -7.47% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc | 449.596000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 2676.360000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 620.408000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 340.262000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 357.464000 ns | 361.703 ns | 1.19% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 171.100000 ns | 171.469 ns | 0.22% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13431.700 ns | 13287.500000 ns | -1.07% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc | 506.426000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 25054.000000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 669.169000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 385.812000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 206.576000 ns | 206.756 ns | 0.09% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 94.366 ns | 93.492100 ns | -0.93% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1513.270 ns | 1480.870000 ns | -2.14% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc | 135.188000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 345.745000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 388.858000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 217.970000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 221.907000 ns | 222.278 ns | 0.17% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 101.663 ns | 101.381000 ns | -0.28% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12876.300 ns | 12587.700000 ns | -2.24% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc | 145.646000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 6819.650000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 410.638000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 229.690000 ns | - |
Details
Benchmark details contain too many chars to display
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14901584825
Compute Benchmarks run ( --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14901584825 Job status: success. Test status: success.
Failures
| Name | Failure |
|---|---|
| umf-benchmark | Benchmark run failure: Command '['/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark', '--benchmark_format=csv', '--benchmark_filter=^jemalloc_pool<os_provider>/multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4$']' died with <Signals.SIGSEGV: 11>. |
Summary
(Emphasized values are the best results)
Improved 1 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 93.462400 ns | 95.713 ns | 2.41% |
Regressed 7 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1809.560 ns | 1671.690000 ns | -7.62% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12948.500 ns | 12010.500000 ns | -7.24% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 103.285 ns | 97.064700 ns | -6.02% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1682.370 ns | 1603.910000 ns | -4.66% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1553.030 ns | 1498.700000 ns | -3.50% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1634.970 ns | 1579.960000 ns | -3.36% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1634.060 ns | 1581.990000 ns | -3.19% |
Performance change in benchmark groups
UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 324.887000 ns | 326.080 ns | 0.37% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 190.627000 ns | 191.058 ns | 0.23% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1634.060 ns | 1581.990000 ns | -3.19% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 376.054 ns | 374.768000 ns | -0.34% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13305.300 ns | 13104.700000 ns | -1.51% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 206.389 ns | 203.003000 ns | -1.64% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 329.243000 ns | 332.738 ns | 1.06% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 164.130000 ns | 164.667 ns | 0.33% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1682.370 ns | 1603.910000 ns | -4.66% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13009.500000 ns | 13259.200 ns | 1.92% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 175.232000 ns | 177.707 ns | 1.41% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 366.805000 ns | 368.133 ns | 0.36% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 93.462400 ns | 95.713 ns | 2.41% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 209.960000 ns | 210.019 ns | 0.03% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1519.520 ns | 1498.210000 ns | -1.40% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 227.540000 ns | 228.838 ns | 0.57% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 103.285 ns | 97.064700 ns | -6.02% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12948.500 ns | 12010.500000 ns | -7.24% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 329.560000 ns | 335.194 ns | 1.71% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 193.428000 ns | 193.987 ns | 0.29% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1634.970 ns | 1579.960000 ns | -3.36% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 383.756000 ns | 390.476 ns | 1.75% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 204.525 ns | 203.413000 ns | -0.54% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13165.300 ns | 12988.000000 ns | -1.35% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 323.621000 ns | 324.060 ns | 0.14% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 159.990 ns | 159.282000 ns | -0.44% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1809.560 ns | 1671.690000 ns | -7.62% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 362.415000 ns | 363.952 ns | 0.42% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 170.913 ns | 169.790000 ns | -0.66% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13651.800 ns | 13542.800000 ns | -0.80% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 93.419000 ns | 94.180 ns | 0.81% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 206.371000 ns | 206.982 ns | 0.30% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1553.030 ns | 1498.700000 ns | -3.50% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 220.644000 ns | 222.850 ns | 1.00% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 102.371 ns | 102.029000 ns | -0.33% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12464.400 ns | 12392.100000 ns | -0.58% |
Details
Benchmark details - environment, command...
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14903508512
Compute Benchmarks run ( --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14903508512 Job status: success. Test status: success.
Summary
(Emphasized values are the best results)
Improved 3 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 167.374000 ns | 177.707 ns | 6.17% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 12633.400000 ns | 12988.000 ns | 2.81% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 355.028000 ns | 363.952 ns | 2.51% |
Regressed 10 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1796.740 ns | 1671.690000 ns | -6.96% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12896.000 ns | 12010.500000 ns | -6.87% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 102.511 ns | 97.064700 ns | -5.31% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13071.600 ns | 12392.100000 ns | -5.20% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1679.070 ns | 1603.910000 ns | -4.48% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1558.050 ns | 1498.210000 ns | -3.84% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1619.210 ns | 1579.960000 ns | -2.42% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 104.553 ns | 102.029000 ns | -2.41% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13423.000 ns | 13104.700000 ns | -2.37% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1620.080 ns | 1581.990000 ns | -2.35% |
Performance change in benchmark groups
UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 190.488000 ns | 191.058 ns | 0.30% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 326.921 ns | 326.080000 ns | -0.26% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1620.080 ns | 1581.990000 ns | -2.35% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc | 182.873000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 4809.570000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 1035.670000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 315.217000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 200.639000 ns | 203.003 ns | 1.18% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 371.290000 ns | 374.768 ns | 0.94% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13423.000 ns | 13104.700000 ns | -2.37% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc | 222.511000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 36802.800000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 1130.150000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 356.261000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 163.133000 ns | 164.667 ns | 0.94% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 331.982000 ns | 332.738 ns | 0.23% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1679.070 ns | 1603.910000 ns | -4.48% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc | 455.409000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 2737.940000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 624.840000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 340.572000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 167.374000 ns | 177.707 ns | 6.17% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 367.808000 ns | 368.133 ns | 0.09% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13378.000 ns | 13259.200000 ns | -0.89% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc | 502.179000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 24712.700000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 660.921000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 370.408000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 95.317100 ns | 95.713 ns | 0.41% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 210.531 ns | 210.019000 ns | -0.24% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1558.050 ns | 1498.210000 ns | -3.84% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc | 137.542000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 339.729000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 394.581000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 220.627000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 229.210 ns | 228.838000 ns | -0.16% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 102.511 ns | 97.064700 ns | -5.31% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12896.000 ns | 12010.500000 ns | -6.87% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc | 147.563000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 6315.920000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 415.993000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 237.223000 ns | - |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> | 17697.600000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider | 16096.300000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> | 1941.630000 ns | - | |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider | 1171.530000 ns | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 (4)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<os_provider> | 0.000000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 os_provider | 0.000000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 proxy_pool<fixed_provider> | 0.000000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:50000/threads:1 fixed_provider | 0.000000 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 0.064949 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 30.628200 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 60.801600 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 0.016245 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 30.578600 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 55.478400 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 26.002600 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 65.724900 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 62.411100 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 25.320600 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 63.405800 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 54.893400 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 23.636700 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 85.085300 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 85.085300 % | - |
Relative perf in group FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 18.978000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 85.078000 % | - | |
| FRAGMENTATION_multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 93.371200 % | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 331.774000 ns | 335.194 ns | 1.03% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 193.880000 ns | 193.987 ns | 0.06% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1619.210 ns | 1579.960000 ns | -2.42% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 glibc | 185.135000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 disjoint_pool<os_provider> | 4702.300000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc_pool<os_provider> | 1055.060000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 scalable_pool<os_provider> | 318.201000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 12633.400000 ns | 12988.000 ns | 2.81% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 389.279000 ns | 390.476 ns | 0.31% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 203.198000 ns | 203.413 ns | 0.11% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 glibc | 236.251000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 disjoint_pool<os_provider> | 36409.600000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc_pool<os_provider> | 1142.490000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 scalable_pool<os_provider> | 385.033000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 321.604000 ns | 324.060 ns | 0.76% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 160.350 ns | 159.282000 ns | -0.67% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1796.740 ns | 1671.690000 ns | -6.96% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 glibc | 451.464000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 2706.910000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 621.351000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 340.486000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 355.028000 ns | 363.952 ns | 2.51% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13451.800000 ns | 13542.800 ns | 0.68% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 170.086 ns | 169.790000 ns | -0.17% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 glibc | 507.437000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 24616.300000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 670.092000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 384.589000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1482.390000 ns | 1498.700 ns | 1.10% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 206.178000 ns | 206.982 ns | 0.39% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 93.995400 ns | 94.180 ns | 0.20% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 glibc | 135.624000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 disjoint_pool<os_provider> | 340.534000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc_pool<os_provider> | 387.611000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 scalable_pool<os_provider> | 216.520000 ns | - |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (7)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 221.713000 ns | 222.850 ns | 0.51% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 104.553 ns | 102.029000 ns | -2.41% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13071.600 ns | 12392.100000 ns | -5.20% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 glibc | 145.625000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 disjoint_pool<os_provider> | 6416.340000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc_pool<os_provider> | 408.984000 ns | - | |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 scalable_pool<os_provider> | 229.668000 ns | - |
Details
Benchmark details contain too many chars to display
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14906290892
Compute Benchmarks run ( --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14906290892 Job status: success. Test status: success.
Failures
| Name | Failure |
|---|---|
| umf-benchmark | Benchmark run failure: Command '['/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark', '--benchmark_format=csv', '--benchmark_filter=^jemalloc_pool<os_provider>/peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4$']' died with <Signals.SIGSEGV: 11>. |
Summary
(Emphasized values are the best results)
Improved 1 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 326.104000 ns | 335.194 ns | 2.79% |
Regressed 11 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1779.480 ns | 1671.690000 ns | -6.06% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1706.700 ns | 1603.910000 ns | -6.02% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12698.400 ns | 12010.500000 ns | -5.42% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1641.150 ns | 1579.960000 ns | -3.73% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12807.400 ns | 12392.100000 ns | -3.24% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1545.610 ns | 1498.700000 ns | -3.04% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13477.200 ns | 13104.700000 ns | -2.76% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 400.992 ns | 390.476000 ns | -2.62% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 208.398 ns | 203.003000 ns | -2.59% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13315.500 ns | 12988.000000 ns | -2.46% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1618.030 ns | 1581.990000 ns | -2.23% |
Performance change in benchmark groups
UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 191.017000 ns | 191.058 ns | 0.02% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 331.356 ns | 326.080000 ns | -1.59% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1618.030 ns | 1581.990000 ns | -2.23% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 380.543 ns | 374.768000 ns | -1.52% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 208.398 ns | 203.003000 ns | -2.59% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13477.200 ns | 13104.700000 ns | -2.76% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 330.076000 ns | 332.738 ns | 0.81% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 163.522000 ns | 164.667 ns | 0.70% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1706.700 ns | 1603.910000 ns | -6.02% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13160.300000 ns | 13259.200 ns | 0.75% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 177.647000 ns | 177.707 ns | 0.03% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 369.627 ns | 368.133000 ns | -0.40% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 94.341600 ns | 95.713 ns | 1.45% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 210.612 ns | 210.019000 ns | -0.28% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1503.200 ns | 1498.210000 ns | -0.33% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 229.655 ns | 228.838000 ns | -0.36% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 98.322 ns | 97.064700 ns | -1.28% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12698.400 ns | 12010.500000 ns | -5.42% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 326.104000 ns | 335.194 ns | 2.79% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 193.422000 ns | 193.987 ns | 0.29% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1641.150 ns | 1579.960000 ns | -3.73% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 204.587 ns | 203.413000 ns | -0.57% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13315.500 ns | 12988.000000 ns | -2.46% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 400.992 ns | 390.476000 ns | -2.62% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 160.232 ns | 159.282000 ns | -0.59% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 326.768 ns | 324.060000 ns | -0.83% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1779.480 ns | 1671.690000 ns | -6.06% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 358.666000 ns | 363.952 ns | 1.47% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13376.700000 ns | 13542.800 ns | 1.24% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 168.942000 ns | 169.790 ns | 0.50% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 206.239000 ns | 206.982 ns | 0.36% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 94.343 ns | 94.180200 ns | -0.17% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1545.610 ns | 1498.700000 ns | -3.04% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 101.501000 ns | 102.029 ns | 0.52% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 222.729000 ns | 222.850 ns | 0.05% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12807.400 ns | 12392.100000 ns | -3.24% |
Details
Benchmark details - environment, command...
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14906989543
Compute Benchmarks run ( --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14906989543 Job status: success. Test status: success.
Failures
| Name | Failure |
|---|---|
| umf-benchmark | Benchmark run failure: Command '['/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark', '--benchmark_format=csv', '--benchmark_filter=^jemalloc_pool<os_provider>/peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4$']' died with <Signals.SIGSEGV: 11>. |
Summary
(Emphasized values are the best results)
Improved 2 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 326.578000 ns | 335.194 ns | 2.64% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13273.900000 ns | 13542.800 ns | 2.03% |
Regressed 8 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1751.790 ns | 1603.910000 ns | -8.44% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12843.300 ns | 12010.500000 ns | -6.48% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1775.380 ns | 1671.690000 ns | -5.84% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13149.500 ns | 12392.100000 ns | -5.76% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1659.460 ns | 1579.960000 ns | -4.79% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1561.420 ns | 1498.210000 ns | -4.05% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1646.770 ns | 1581.990000 ns | -3.93% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 208.565 ns | 203.003000 ns | -2.67% |
Performance change in benchmark groups
UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 324.642000 ns | 326.080 ns | 0.44% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 191.015000 ns | 191.058 ns | 0.02% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1646.770 ns | 1581.990000 ns | -3.93% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 12947.700000 ns | 13104.700 ns | 1.21% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 372.589000 ns | 374.768 ns | 0.58% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 208.565 ns | 203.003000 ns | -2.67% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 163.158000 ns | 164.667 ns | 0.92% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 329.825000 ns | 332.738 ns | 0.88% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1751.790 ns | 1603.910000 ns | -8.44% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13146.800000 ns | 13259.200 ns | 0.85% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 177.490000 ns | 177.707 ns | 0.12% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 370.296 ns | 368.133000 ns | -0.58% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 94.234300 ns | 95.713 ns | 1.57% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 210.287 ns | 210.019000 ns | -0.13% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1561.420 ns | 1498.210000 ns | -4.05% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 228.860 ns | 228.838000 ns | -0.01% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 98.471 ns | 97.064700 ns | -1.43% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12843.300 ns | 12010.500000 ns | -6.48% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 326.578000 ns | 335.194 ns | 2.64% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 193.976000 ns | 193.987 ns | 0.01% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1659.460 ns | 1579.960000 ns | -4.79% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 387.830000 ns | 390.476 ns | 0.68% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 12973.800000 ns | 12988.000 ns | 0.11% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 206.112 ns | 203.413000 ns | -1.31% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 325.383 ns | 324.060000 ns | -0.41% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 160.364 ns | 159.282000 ns | -0.67% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1775.380 ns | 1671.690000 ns | -5.84% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13273.900000 ns | 13542.800 ns | 2.03% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 363.747000 ns | 363.952 ns | 0.06% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 171.856 ns | 169.790000 ns | -1.20% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 206.608000 ns | 206.982 ns | 0.18% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 94.167000 ns | 94.180 ns | 0.01% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1509.660 ns | 1498.700000 ns | -0.73% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 220.918000 ns | 222.850 ns | 0.87% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 102.396 ns | 102.029000 ns | -0.36% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13149.500 ns | 12392.100000 ns | -5.76% |
Details
Benchmark details - environment, command...
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
See looped sanitizers and valgrind builds: https://github.com/ldorau/unified-memory-framework/actions/runs/14905978766 https://github.com/ldorau/unified-memory-framework/actions/runs/14906349518
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14923909427
Compute Benchmarks run ( --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14923909427 Job status: success. Test status: success.
Failures
| Name | Failure |
|---|---|
| umf-benchmark | Benchmark run failure: Command '['/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark', '--benchmark_format=csv', '--benchmark_filter=^jemalloc_pool<os_provider>/multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4$']' died with <Signals.SIGSEGV: 11>. |
Summary
(Emphasized values are the best results)
Improved 1 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 12920.600000 ns | 13464.400 ns | 4.21% |
Regressed 10 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13365.800 ns | 12139.800000 ns | -9.17% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13180.400 ns | 12404.200000 ns | -5.89% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 177.729 ns | 170.388000 ns | -4.13% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1744.790 ns | 1674.740000 ns | -4.01% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 218.525 ns | 210.579000 ns | -3.64% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1705.190 ns | 1648.060000 ns | -3.35% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1642.570 ns | 1588.270000 ns | -3.31% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1636.960 ns | 1583.410000 ns | -3.27% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13470.800 ns | 13097.900000 ns | -2.77% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 205.826 ns | 200.553000 ns | -2.56% |
Performance change in benchmark groups
UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 329.447000 ns | 334.650 ns | 1.58% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 190.313000 ns | 190.797 ns | 0.25% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1642.570 ns | 1588.270000 ns | -3.31% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 373.786000 ns | 377.507 ns | 1.00% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13105.100 ns | 12885.500000 ns | -1.68% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 205.826 ns | 200.553000 ns | -2.56% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 330.891000 ns | 332.385 ns | 0.45% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 163.429000 ns | 163.816 ns | 0.24% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1705.190 ns | 1648.060000 ns | -3.35% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 366.340000 ns | 367.606 ns | 0.35% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13470.800 ns | 13097.900000 ns | -2.77% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 177.729 ns | 170.388000 ns | -4.13% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 93.971700 ns | 94.841 ns | 0.92% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1529.520 ns | 1510.930000 ns | -1.22% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 218.525 ns | 210.579000 ns | -3.64% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 227.056 ns | 226.168000 ns | -0.39% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 97.548 ns | 96.968300 ns | -0.59% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13365.800 ns | 12139.800000 ns | -9.17% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 325.387 ns | 325.060000 ns | -0.10% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 193.919 ns | 193.220000 ns | -0.36% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1636.960 ns | 1583.410000 ns | -3.27% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 204.280000 ns | 206.070 ns | 0.88% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 387.838000 ns | 388.958 ns | 0.29% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13178.600 ns | 13053.400000 ns | -0.95% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 322.483000 ns | 324.052 ns | 0.49% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 159.913000 ns | 160.323 ns | 0.26% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1744.790 ns | 1674.740000 ns | -4.01% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 12920.600000 ns | 13464.400 ns | 4.21% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 361.554000 ns | 362.471 ns | 0.25% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 169.634000 ns | 169.854 ns | 0.13% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 93.784600 ns | 94.763 ns | 1.04% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 206.195000 ns | 206.822 ns | 0.30% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1512.700 ns | 1494.700000 ns | -1.19% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 102.668 ns | 101.792000 ns | -0.85% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 223.403 ns | 221.478000 ns | -0.86% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13180.400 ns | 12404.200000 ns | -5.89% |
Details
Benchmark details - environment, command...
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14924383289
Compute Benchmarks run ( --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14924383289 Job status: success. Test status: success.
Failures
| Name | Failure |
|---|---|
| umf-benchmark | Benchmark run failure: Command '['/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark', '--benchmark_format=csv', '--benchmark_filter=^jemalloc_pool<os_provider>/peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4$']' died with <Signals.SIGSEGV: 11>. |
Summary
(Emphasized values are the best results)
Regressed 11 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12868.100 ns | 12139.800000 ns | -5.66% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1765.020 ns | 1674.740000 ns | -5.11% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13017.400 ns | 12404.200000 ns | -4.71% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1658.280 ns | 1583.410000 ns | -4.51% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13940.100 ns | 13464.400000 ns | -3.41% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1562.850 ns | 1510.930000 ns | -3.32% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1641.300 ns | 1588.270000 ns | -3.23% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 334.920 ns | 325.060000 ns | -2.94% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13261.400 ns | 12885.500000 ns | -2.83% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1694.090 ns | 1648.060000 ns | -2.72% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1527.750 ns | 1494.700000 ns | -2.16% |
Performance change in benchmark groups
UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 328.829000 ns | 334.650 ns | 1.77% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 190.679000 ns | 190.797 ns | 0.06% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1641.300 ns | 1588.270000 ns | -3.23% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 375.025000 ns | 377.507 ns | 0.66% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 200.676 ns | 200.553000 ns | -0.06% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13261.400 ns | 12885.500000 ns | -2.83% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 329.887000 ns | 332.385 ns | 0.76% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 163.796000 ns | 163.816 ns | 0.01% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1694.090 ns | 1648.060000 ns | -2.72% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 366.551000 ns | 367.606 ns | 0.29% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 171.843 ns | 170.388000 ns | -0.85% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13294.500 ns | 13097.900000 ns | -1.48% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 94.202900 ns | 94.841 ns | 0.68% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 210.465000 ns | 210.579 ns | 0.05% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1562.850 ns | 1510.930000 ns | -3.32% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 97.310 ns | 96.968300 ns | -0.35% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 228.570 ns | 226.168000 ns | -1.05% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12868.100 ns | 12139.800000 ns | -5.66% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 194.874 ns | 193.220000 ns | -0.85% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 334.920 ns | 325.060000 ns | -2.94% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1658.280 ns | 1583.410000 ns | -4.51% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 204.068000 ns | 206.070 ns | 0.98% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 386.781000 ns | 388.958 ns | 0.56% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13052.700000 ns | 13053.400 ns | 0.01% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 159.639000 ns | 160.323 ns | 0.43% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 324.027000 ns | 324.052 ns | 0.01% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1765.020 ns | 1674.740000 ns | -5.11% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 358.806000 ns | 362.471 ns | 1.02% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 169.835000 ns | 169.854 ns | 0.01% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13940.100 ns | 13464.400000 ns | -3.41% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 93.610300 ns | 94.763 ns | 1.23% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 206.872 ns | 206.822000 ns | -0.02% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1527.750 ns | 1494.700000 ns | -2.16% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 221.224000 ns | 221.478 ns | 0.11% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 103.287 ns | 101.792000 ns | -1.45% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13017.400 ns | 12404.200000 ns | -4.71% |
Details
Benchmark details - environment, command...
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
Please review :-)
Compute Benchmarks run (with params: --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14965460163
Compute Benchmarks run ( --compare 'Baseline_PVC'): https://github.com/oneapi-src/unified-memory-framework/actions/runs/14965460163 Job status: success. Test status: success.
Failures
| Name | Failure |
|---|---|
| umf-benchmark | Benchmark run failure: Command '['/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark', '--benchmark_format=csv', '--benchmark_filter=^jemalloc_pool<os_provider>/peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4$']' died with <Signals.SIGSEGV: 11>. |
Summary
(Emphasized values are the best results)
Improved 3 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 97.426700 ns | 105.063 ns | 7.84% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12434.900000 ns | 12733.900 ns | 2.40% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 386.513000 ns | 394.715 ns | 2.12% |
Regressed 8 (threshold 2.00%)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13437.400 ns | 12151.800000 ns | -9.57% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 178.594 ns | 168.080000 ns | -5.89% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1789.360 ns | 1694.860000 ns | -5.28% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1572.110 ns | 1518.620000 ns | -3.40% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1682.920 ns | 1626.080000 ns | -3.38% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1621.400 ns | 1575.560000 ns | -2.83% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13281.600 ns | 12951.200000 ns | -2.49% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 12805.800 ns | 12535.400000 ns | -2.11% |
Performance change in benchmark groups
UMF
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1623.020 ns | 1621.170000 ns | -0.11% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 191.674 ns | 190.677000 ns | -0.52% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 328.944 ns | 324.198000 ns | -1.44% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 203.079000 ns | 207.004 ns | 1.93% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 371.666000 ns | 376.234 ns | 1.23% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13281.600 ns | 12951.200000 ns | -2.49% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 333.217000 ns | 334.684 ns | 0.44% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 163.234000 ns | 163.349 ns | 0.07% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1682.920 ns | 1626.080000 ns | -3.38% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 369.811 ns | 367.375000 ns | -0.66% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 12805.800 ns | 12535.400000 ns | -2.11% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 178.594 ns | 168.080000 ns | -5.89% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 94.195300 ns | 94.221 ns | 0.03% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 210.315000 ns | 210.332 ns | 0.01% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1572.110 ns | 1518.620000 ns | -3.40% |
Relative perf in group multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 97.426700 ns | 105.063 ns | 7.84% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 228.144000 ns | 228.845 ns | 0.31% |
| multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 13437.400 ns | 12151.800000 ns | -9.57% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy | 326.673000 ns | 327.892 ns | 0.37% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc | 194.167 ns | 193.668000 ns | -0.26% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy | 1621.400 ns | 1575.560000 ns | -2.83% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy | 386.513000 ns | 394.715 ns | 2.12% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc | 204.052000 ns | 204.660 ns | 0.30% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy | 13357.600 ns | 13163.500000 ns | -1.45% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc | 161.253 ns | 159.729000 ns | -0.95% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy | 325.763 ns | 319.637000 ns | -1.88% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy | 1789.360 ns | 1694.860000 ns | -5.28% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy | 13354.700000 ns | 13569.000 ns | 1.60% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy | 364.081000 ns | 365.530 ns | 0.40% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc | 170.075 ns | 169.294000 ns | -0.46% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy | 206.784000 ns | 207.466 ns | 0.33% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc | 94.213 ns | 93.294700 ns | -0.98% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy | 1548.980 ns | 1524.950000 ns | -1.55% |
Relative perf in group peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 (3)
| Benchmark | This PR | Baseline_PVC | Change |
|---|---|---|---|
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy | 12434.900000 ns | 12733.900 ns | 2.40% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc | 101.827000 ns | 103.152 ns | 1.30% |
| peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy | 222.592 ns | 221.745000 ns | -0.38% |
Details
Benchmark details - environment, command...
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 umfProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/lib/libumf_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 jemalloc
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libjemalloc.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
multiple_malloc_free/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/size:4096/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:4096/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:1 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so
peak_alloc/max_allocs:10000/thread_local_allocations:1/min_size:8/max_size:128/granularity:8/iterations:500000/threads:4 tbbProxy
Command:
/home/test-user/actions-runners/umf-perf-runner/_work/unified-memory-framework/unified-memory-framework/umf-repo/build/benchmark/umf-benchmark --benchmark_format=csv
Environment Variables:
LD_PRELOAD=libtbbmalloc_proxy.so