[L0] Phase 2 of Counter-Based Event Implementation
-enable counter-based events for regular commandlist -counter-based events may be reused even though they are not done -when ref count goes to not used by external clients value it means that event may be reused by subsequent calls -move events that are no longer externally visible to re-usable pool and reuse those more aggressively
intel/llvm PR: https://github.com/intel/llvm/pull/14754
This does not compile /w L0 adapter enabled. Also, feel free to add a relevant benchmark scenario to https://github.com/oneapi-src/unified-runtime/blob/main/.github/scripts/compute_benchmarks.py, or just run the existing benchmark with whatever env variables are needed. You can run these from: https://github.com/oneapi-src/unified-runtime/actions/workflows/benchmarks_compute.yml
You can reach out to me if you need help or advice.
@pbalcer It should compile now, working out some of the e2e tests that are still failing.
@winstonzhang-intel , please link the intel/llvm PR related to this issue so we can see the full e2e test results.
Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/9694638615
Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/9694638615 Job status: failure. Test status: skipped.
Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/9780598178
Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/9780598178 Job status: success. Test status: success.
Benchmark Results
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_sycl, mean execution time per 10 kernels (μs)
todayMarker off
dateFormat X
axisFormat %s
section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF
This PR (38.675 us) : crit, 0, 38
baseline (38.357 us) : 0, 38
- : 0, 0
- : 0, 0
section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF
This PR (36.082 us) : crit, 0, 36
baseline (36.972 us) : 0, 36
- : 0, 0
- : 0, 0
section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>
This PR (40.549 us) : crit, 0, 40
baseline (41.505 us) : 0, 41
- : 0, 0
- : 0, 0
section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>
This PR (40.023 us) : crit, 0, 40
baseline (41.129 us) : 0, 41
- : 0, 0
- : 0, 0
Details
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF
Environment Variables:
UR_L0_USE_IMMEDIATE_COMMANDLISTS=0 UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),38.675,38.403,4.91%,37.600,206.755,[CPU],[us]
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF
Environment Variables:
UR_L0_USE_IMMEDIATE_COMMANDLISTS=0 UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),36.082,36.040,2.38%,35.332,112.299,[CPU],[us]
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)
Environment Variables:
UR_L0_USE_IMMEDIATE_COMMANDLISTS=1 UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),40.549,40.484,2.12%,39.520,109.681,[CPU],[us]
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)
Environment Variables:
UR_L0_USE_IMMEDIATE_COMMANDLISTS=1 UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),40.023,39.999,2.41%,38.600,109.795,[CPU],[us]
Compute Benchmarks level_zero run (with params: --compare baseline --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/10055105565
Compute Benchmarks level_zero run (--compare baseline --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/10055105565 Job status: failure. Test status: failure.
easyWave_sycl -grid examples/e2Asean.grd -source examples/BengkuluSept2007.flt -time 120
easyWave_sycl benchmark hanged with this PR.
easyWave_sycl -grid examples/e2Asean.grd -source examples/BengkuluSept2007.flt -time 120
easyWave_syclbenchmark hanged with this PR.
@pbalcer , how can one get this benchmark and run locally? That way @winstonzhang-intel can investigate the issue locally.
@pbalcer getting different results on llvm/sycl test-e2e. Also confirmed locally on a PVC machine. The following tests were passing on my machine:
- SYCL :: DiscardEvents/discard_events_mixed_calls.cpp
- SYCL :: ESIMD/BitonicSortKv2.cpp
- SYCL :: ESIMD/kmeans/kmeans.cpp
- SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp
- SYCL :: Graph/RecordReplay/dotp_in_order.cpp
- SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp
- SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp
- SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp
- SYCL :: Graph/RecordReplay/host_task_in_order.cpp
- SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp
- SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp
An example output of one of the tests: $ LD_LIBRARY_PATH=/iusers/winstonz/lib/driver/:/iusers/winstonz/llvm/build/lib:$LD_LIBRARY_PATH UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 ./build/bin/llvm-lit -vv sycl/test-e2e/Graph/RecordReplay/usm_copy_in_order.cpp llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:414: note: Targeted devices: all llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:635: warning: Couldn't find pre-installed AOT device compiler ocloc llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:635: warning: Couldn't find pre-installed AOT device compiler opencl-aot llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:733: note: Aspects for level_zero:gpu: ext_oneapi_fixed_size_group, gpu, queue_profiling, ext_oneapi_bindless_images_shared_usm, ext_intel_device_id, usm_atomic_shared_allocations, ext_intel_gpu_subslices_per_slice, ext_oneapi_private_alloca, ext_intel_gpu_eu_simd_width, usm_device_allocations, ext_oneapi_bindless_images_2d_usm, ext_oneapi_graph, ext_oneapi_queue_profiling_tag, ext_oneapi_bindless_images, fp16, ext_intel_gpu_hw_threads_per_eu, online_linker, ext_oneapi_tangle_group, online_compiler, usm_host_allocations, ext_intel_memory_bus_width, ext_intel_gpu_eu_count_per_subslice, fp64, ext_intel_memory_clock_rate, ext_intel_gpu_eu_count, ext_oneapi_mipmap_anisotropy, ext_intel_device_info_uuid, ext_intel_matrix, ext_oneapi_opportunistic_group, ext_intel_pci_address, ext_oneapi_mipmap, ext_oneapi_ballot_group, ext_intel_esimd, atomic64, usm_shared_allocations, ext_oneapi_virtual_mem, ext_intel_gpu_slices, ext_oneapi_limited_graph llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:745: note: SG sizes for level_zero:gpu: 16, 32 llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:754: note: Architectures for level_zero:gpu: intel_gpu_pvc -- Testing: 1 tests, 1 workers -- PASS: SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp (1 of 1)
Testing Time: 77.00s
Total Discovered Tests: 1 Passed: 1 (100.00%)
2 warning(s) in tests
https://github.com/oneapi-src/Velocity-Bench/tree/main/easywave
You can also use our automation scripts: https://github.com/oneapi-src/unified-runtime/tree/main/scripts/benchmarks
There's no way to select a single benchmark, yet, but for now you can comment out all the benchmarks but easywave: https://github.com/oneapi-src/unified-runtime/blob/main/scripts/benchmarks/main.py#L40
As for the failing E2E tests, please create a PR on intel/llvm if you feel the fails in UR CI are incorrect.
lgtm once all tests are green and the benchmarks are passing.
Just curious, why not base this PR on #1600?
1600 still have some tests that are not passing so I didn't rebase against that. Here's the CI on llvm/sycl that is all passing: https://github.com/intel/llvm/pull/14754 ^None of the tests that URT CI claims to be failing are failing on llvm/sycl CI
^None of the tests that URT CI claims to be failing are failing on llvm/sycl CI
They don't have a PVC system in CI. Other PRs (see this PR) do not exhibit the same failures as this one (ignoring the address sanitizer problem that popped up yesterday). These failures seem to be unique for this PR:
SYCL :: DiscardEvents/discard_events_mixed_calls.cpp
SYCL :: ESIMD/BitonicSortKv2.cpp
SYCL :: ESIMD/kmeans/kmeans.cpp
SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp
SYCL :: Graph/RecordReplay/dotp_in_order.cpp
SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp
SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp
SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp
SYCL :: Graph/RecordReplay/host_task_in_order.cpp
SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp
SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/10094246782
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/10094246782 Job status: failure. Test status: failure.
^None of the tests that URT CI claims to be failing are failing on llvm/sycl CI
They don't have a PVC system in CI. Other PRs (see this PR) do not exhibit the same failures as this one (ignoring the address sanitizer problem that popped up yesterday). These failures seem to be unique for this PR:
SYCL :: DiscardEvents/discard_events_mixed_calls.cpp SYCL :: ESIMD/BitonicSortKv2.cpp SYCL :: ESIMD/kmeans/kmeans.cpp SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp SYCL :: Graph/RecordReplay/dotp_in_order.cpp SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp SYCL :: Graph/RecordReplay/host_task_in_order.cpp SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp
I've tried at least 5 PVC machines now and none of them seems to be able to reproduce these failures.
^None of the tests that URT CI claims to be failing are failing on llvm/sycl CI
They don't have a PVC system in CI. Other PRs (see this PR) do not exhibit the same failures as this one (ignoring the address sanitizer problem that popped up yesterday). These failures seem to be unique for this PR:
SYCL :: DiscardEvents/discard_events_mixed_calls.cpp SYCL :: ESIMD/BitonicSortKv2.cpp SYCL :: ESIMD/kmeans/kmeans.cpp SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp SYCL :: Graph/RecordReplay/dotp_in_order.cpp SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp SYCL :: Graph/RecordReplay/host_task_in_order.cpp SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp SYCL :: Graph/RecordReplay/usm_copy_in_order.cppI've tried at least 5 PVC machines now and none of them seems to be able to reproduce these failures.
@winstonzhang-intel , PVC runs immediate command lists by default, this functionality is for regular command lists so you need to test on GEN12, DG2, or Flex gpu.
@pbalcer Seems like the e2e L0 tests are getting stuck. Could you please check that? I've also tried to run the the e2e tests locally, and they all seem to be passing. This is running on gen12 and regular commandlist should be in use: `$ bash ./test.sh llvm-lit: /home/scss_dev/workspace/llvm/sycl/test-e2e/lit.cfg.py:769: note: Architectures for opencl:gpu: intel_gpu_adl_s -- Testing: 11 tests, 11 workers -- PASS: SYCL :: Graph/RecordReplay/dotp_in_order.cpp (1 of 11) PASS: SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp (2 of 11) PASS: SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp (3 of 11) PASS: SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp (4 of 11) PASS: SYCL :: Graph/RecordReplay/host_task_in_order.cpp (5 of 11) PASS: SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp (6 of 11) PASS: SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp (7 of 11) PASS: SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp (8 of 11) PASS: SYCL :: DiscardEvents/discard_events_mixed_calls.cpp (9 of 11) PASS: SYCL :: ESIMD/BitonicSortKv2.cpp (10 of 11) PASS: SYCL :: ESIMD/kmeans/kmeans.cpp (11 of 11)
Testing Time: 21.48s
Total Discovered Tests: 11 Passed: 11 (100.00%)`
@pbalcer Seems like the e2e L0 tests are getting stuck.
The system we used in CI died and we haven't managed to get it back up yet.
Thanks for checking that the e2e tests are now passing. I'm not sure what was wrong with the runs in the CI (maybe a stale commit?).
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/10195419517
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/10195419517 Job status: failure. Test status: failure.
CudaSift benchmark has failed:
Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 87 3142 2.3622% 1 2
Performing data verification
Data verification FAILED.
This is on 1T PVC.
You can run the same benchmark by using the scripts here:
$ ./main.py ~/benchmarks_workdir/ ~/llvm/build/ --filter CudaSift --iterations 1
Where benchmarks_workdir is a location where the benchmarks will be built and ~/llvm/build/ is a location of the compiler that was built with the desired UR version. See $ ./main.py --help for more options
Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/10305771352
Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/10305771352 Job status: failure. Test status: failure.
Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/10880913609
Compute Benchmarks level_zero run (--env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/10880913609 Job status: success. Test status: success.
Summary
result is better
| Benchmark | This PR | baseline |
|---|---|---|
| api_overhead_benchmark_sycl SubmitKernel out of order | 48.362 | 50.631 |
| api_overhead_benchmark_sycl SubmitKernel in order | 47.024 | 49.385 |
| api_overhead_benchmark_ur SubmitKernel out of order | 31.312 | 31.93 |
| api_overhead_benchmark_ur SubmitKernel in order | 25.546 | 28.586 |
| memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 | 424.685 | 423.457 |
| memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 | 261.384 | 253.906 |
| memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 | 10.089 | 9.179 |
| memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 | 3.002 | 1.854 |
| api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 | 2.143 | 4.506 |
| api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 | 2.096 | 3.613 |
| miscellaneous_benchmark_sycl VectorSum | 858.416 | 863.651 |
| Velocity-Bench Hashtable | 207.852567 | 178.291413 |
| Velocity-Bench Bitcracker | 35.6076 | 35.8407 |
| Velocity-Bench CudaSift | 256.843 | 283.294 |
| Velocity-Bench Easywave | 446 | 457.0 |
| Velocity-Bench QuickSilver | 90.08 | 115.63 |
| Velocity-Bench Sobel Filter | 985.857 | 934.963 |
Charts
api_overhead_benchmark_sycl SubmitKernel out of order
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_sycl SubmitKernel out of order
todayMarker off
dateFormat X
axisFormat %s
section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)
This PR (48.362 μs) : crit, 0, 48
baseline (50.631 μs) : 0, 50
- : 0, 0
- : 0, 0
api_overhead_benchmark_sycl SubmitKernel in order
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_sycl SubmitKernel in order
todayMarker off
dateFormat X
axisFormat %s
section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)
This PR (47.024 μs) : crit, 0, 47
baseline (49.385 μs) : 0, 49
- : 0, 0
- : 0, 0
api_overhead_benchmark_ur SubmitKernel out of order
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_ur SubmitKernel out of order
todayMarker off
dateFormat X
axisFormat %s
section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)
This PR (31.312 μs) : crit, 0, 31
baseline (31.93 μs) : 0, 31
- : 0, 0
- : 0, 0
api_overhead_benchmark_ur SubmitKernel in order
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_ur SubmitKernel in order
todayMarker off
dateFormat X
axisFormat %s
section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)
This PR (25.546 μs) : crit, 0, 25
baseline (28.586 μs) : 0, 28
- : 0, 0
- : 0, 0
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
todayMarker off
dateFormat X
axisFormat %s
section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB<br>count=100)
This PR (424.685 μs) : crit, 0, 424
baseline (423.457 μs) : 0, 423
- : 0, 0
- : 0, 0
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
todayMarker off
dateFormat X
axisFormat %s
section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Host<br>destinationPlacement=Device<br>size=1KB<br>count=100)
This PR (261.384 μs) : crit, 0, 261
baseline (253.906 μs) : 0, 253
- : 0, 0
- : 0, 0
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
todayMarker off
dateFormat X
axisFormat %s
section QueueMemcpy(api=sycl<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB)
This PR (10.089 μs) : crit, 0, 10
baseline (9.179 μs) : 0, 9
- : 0, 0
- : 0, 0
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
todayMarker off
dateFormat X
axisFormat %s
section StreamMemory(api=sycl<br>type=Triad<br>size=10KB<br>useEvents=0<br>contents=Zeros<br>memoryPlacement=Device)
This PR (3.002 μs) : crit, 0, 3
baseline (1.854 μs) : 0, 1
- : 0, 0
- : 0, 0
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
todayMarker off
dateFormat X
axisFormat %s
section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Device<br>dst=Device<br>size=1KB<br>ioq=0)
This PR (2.143 μs) : crit, 0, 2
baseline (4.506 μs) : 0, 4
- : 0, 0
- : 0, 0
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
todayMarker off
dateFormat X
axisFormat %s
section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Host<br>dst=Host<br>size=1KB<br>ioq=1)
This PR (2.096 μs) : crit, 0, 2
baseline (3.613 μs) : 0, 3
- : 0, 0
- : 0, 0
miscellaneous_benchmark_sycl VectorSum
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title miscellaneous_benchmark_sycl VectorSum
todayMarker off
dateFormat X
axisFormat %s
section VectorSum(api=sycl<br>numberOfElementsX=512<br>numberOfElementsY=256<br>numberOfElementsZ=256)
This PR (858.416 μs) : crit, 0, 858
baseline (863.651 μs) : 0, 863
- : 0, 0
- : 0, 0
Velocity-Bench Hashtable
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench Hashtable
todayMarker off
dateFormat X
axisFormat %s
section hashtable
This PR (207.852567 M keys/sec) : crit, 0, 207
baseline (178.291413 M keys/sec) : 0, 178
- : 0, 0
- : 0, 0
Velocity-Bench Bitcracker
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench Bitcracker
todayMarker off
dateFormat X
axisFormat %s
section bitcracker
This PR (35.6076 s) : crit, 0, 35
baseline (35.8407 s) : 0, 35
- : 0, 0
- : 0, 0
Velocity-Bench CudaSift
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench CudaSift
todayMarker off
dateFormat X
axisFormat %s
section cudaSift
This PR (256.843 ms) : crit, 0, 256
baseline (283.294 ms) : 0, 283
- : 0, 0
- : 0, 0
Velocity-Bench Easywave
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench Easywave
todayMarker off
dateFormat X
axisFormat %s
section easywave
This PR (446 ms) : crit, 0, 446
baseline (457.0 ms) : 0, 457
- : 0, 0
- : 0, 0
Velocity-Bench QuickSilver
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench QuickSilver
todayMarker off
dateFormat X
axisFormat %s
section QuickSilver
This PR (90.08 MMS/CTT) : crit, 0, 90
baseline (115.63 MMS/CTT) : 0, 115
- : 0, 0
- : 0, 0
Velocity-Bench Sobel Filter
---
config:
gantt:
rightPadding: 10
leftPadding: 120
sectionFontSize: 10
numberSectionStyles: 2
---
gantt
title Velocity-Bench Sobel Filter
todayMarker off
dateFormat X
axisFormat %s
section sobel_filter
This PR (985.857 ms) : crit, 0, 985
baseline (934.963 ms) : 0, 934
- : 0, 0
- : 0, 0
Details
SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),48.362,47.646,7.34%,43.188,547.322,[CPU],[us]
SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),47.024,46.508,6.65%,44.278,209.617,[CPU],[us]
SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),31.312,31.050,6.53%,29.597,503.558,[CPU],[us]
SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.546,29.884,27.77%,13.324,230.644,[CPU],[us]
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),424.685,467.871,19.83%,246.890,870.042,[CPU],[us]
QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),261.384,238.517,22.09%,230.359,746.004,[CPU],[us]
QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),10.089,9.944,18.73%,7.751,150.687,[CPU],[us]
StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.002,3.081,6.77%,0.382,3.365,[CPU],[GB/s]
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.143,2.101,14.10%,1.894,75.835,[CPU],[us]
ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),2.096,1.670,45.10%,1.554,28.530,[CPU],[us]
VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256)
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256
Output:
TestCase,Mean,Median,StdDev,Min,Max,Type VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),858.416,858.902,0.49%,821.607,879.002,[GPU],bw [GB/s]
hashtable
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify
Output:
hashtable - total time for whole calculation: 0.645735 s 207.852567 million keys/second
bitcracker
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000
Output:
---------> BitCracker: BitLocker password cracking tool <---------
================================== Retrieving Info
Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"
Attack
================================================ Type of attack: User Password Psw per thread: 1 max_num_pswd_per_read: 60000 Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt MAC Comparison (-m): Yes
Iter: 1, num passwords read: 60000 Kernel execution: Effective passwords: 60000 Passwords Range: npknpByH7N2m3OnLNH1X9DJxLrzIFWk ..... dL_7uuf3QCz-c6K3xDu0
================================================ Bitcracker attack completed Total passwords evaluated: 60000 Password not found!
time to subtract from total: 0.0101897 s bitcracker - total time for whole calculation: 35.6076 s
cudaSift
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/cudaSift/cudaSift
Output:
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1185 1247 32.1749% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1221 1256 33.1523% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1138 1277 30.8987% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1217 1253 33.0437% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1267 33.451% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1140 1265 30.953% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1262 33.451% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1265 33.3695% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1263 33.3424% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1237 1270 33.5868% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1104 1259 29.9756% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1155 1257 31.3603% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1236 1273 33.5596% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1121 1258 30.4371% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1241 1274 33.6954% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1236 1268 33.5596% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1163 1253 31.5775% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1249 1284 33.9126% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1223 1256 33.2066% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1070 1268 29.0524% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1238 1273 33.6139% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1221 1254 33.1523% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1231 1268 33.4238% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1159 1261 31.4689% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1265 33.3967% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1265 33.3967% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1227 1261 33.3152% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1265 33.451% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1220 1260 33.1252% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1267 33.3967% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1221 1256 33.1523% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1236 1275 33.5596% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1099 1259 29.8398% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1231 1267 33.4238% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1264 33.3695% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1108 1276 30.0842% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1110 1249 30.1385% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1263 33.2881% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1205 1271 32.7179% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1131 1264 30.7087% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1266 33.4781% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1134 1274 30.7901% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1238 1270 33.6139% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1131 1263 30.7087% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1219 1250 33.098% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1257 33.2881% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1267 33.451% 1 2
Performing data verification Data verification is SUCCESSFUL.
Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1272 33.4781% 1 2
Performing data verification Data verification is SUCCESSFUL.
Avg workload time = 256.843 ms
easywave
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1
Command:
/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120
Output:
MAIN: Starting SYCL main program MAIN: Attempting to clean up previous eWave tsunami files MAIN: Clean up completed SYCL: SYCL Queue initialization successful SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27) SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero MAIN: Program successfully completed
QuickSilver
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1 QS_DEVICE=GPU
Command:
/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp
Output:
Copyright (c) 2016 Lawrence Livermore National Security, LLC All Rights Reserved Quicksilver Version : Quicksilver Git Hash : MPI Version : 3.0 Number of MPI ranks : 1 Number of OpenMP Threads: 1 Number of OpenMP CPUs : 1
Loading params Finished loading params Simulation: dt: 1e-08 fMax: 0.1 inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp energySpectrum: boundaryCondition: octant loadBalance: 1 cycleTimers: 0 debugThreads: 0 lx: 100 ly: 100 lz: 100 nParticles: 10000000 batchSize: 0 nBatches: 10 nSteps: 10 nx: 10 ny: 10 nz: 10 seed: 1029384756 xDom: 0 yDom: 0 zDom: 0 eMax: 20 eMin: 1e-09 nGroups: 230 lowWeightCutoff: 0.001 bTally: 1 fTally: 1 cTally: 1 coralBenchmark: 0 crossSectionsOut:
Geometry: material: sourceMaterial shape: brick xMax: 100 xMin: 0 yMax: 100 yMin: 0 zMax: 100 zMin: 0
Material: name: sourceMaterial mass: 1000 nIsotopes: 10 nReactions: 9 sourceRate: 1e+10 totalCrossSection: 0.1 absorptionCrossSection: flat fissionCrossSection: flat scatteringCrossSection: flat absorptionCrossSectionRatio: 0 fissionCrossSectionRatio: 0 scatteringCrossSectionRatio: 1
CrossSection: name: flat A: 0 B: 0 C: 0 D: 0 E: 1 nuBar: 2.4 setting GPU setting parameters Building partition 0 Building partition 1 Building partition 2 Building partition 3 Building MC_Domain 0 Building MC_Domain 1 Building MC_Domain 2 Building MC_Domain 3 Starting Consistency Check Finished Consistency Check Finished initMesh Started copyMaterialDatabase_device Finished copyMaterialDatabase_device Finished copyNuclearData_device Finished copyDomainDevice cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize 0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.411710e-01 8.249170e-01 0.000000e+00 1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.726020e-01 9.738420e-01 0.000000e+00 2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 5.810500e-01 1.006878e+00 0.000000e+00 3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 6.018250e-01 1.105585e+00 0.000000e+00 4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 5.608290e-01 1.040724e+00 0.000000e+00 5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.749500e-01 9.924050e-01 0.000000e+00 6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 5.696560e-01 1.000601e+00 0.000000e+00 7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 5.518340e-01 1.028976e+00 0.000000e+00 8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 5.396320e-01 1.035437e+00 0.000000e+00 9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 5.596030e-01 9.911010e-01 0.000000e+00
Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative Name number microSecs microSecs microSecs microSecs Efficiency of calls min avg max stddev Rating main 1 1.516e+07 1.516e+07 1.516e+07 0.000e+00 100.00 cycleInit 10 5.153e+06 5.153e+06 5.153e+06 0.000e+00 100.00 cycleTracking 10 1.000e+07 1.000e+07 1.000e+07 0.000e+00 100.00 cycleTracking_Kernel 104 4.942e+06 4.942e+06 4.942e+06 0.000e+00 100.00 cycleTracking_MPI 117 2.556e+05 2.556e+05 2.556e+05 0.000e+00 100.00 cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00 cycleFinalize 20 7.140e+02 7.140e+02 7.140e+02 0.000e+00 100.00 Figure Of Merit 90.08 [Num Mega Segments / Cycle Tracking Time]
sobel_filter
Environment Variables:
UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1 OPENCV_IO_MAX_IMAGE_PIXELS=1677721600
Command:
/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5
Output:
SYMN: Welcome to the SYCL version of Sobel filter workload. SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png SYMN: Launching SYCL kernel with # of iterations: 5 time to subtract from total: 14.9964 s sobelfilter - total time for whole calculation: 0.985857 s