unified-runtime icon indicating copy to clipboard operation
unified-runtime copied to clipboard

[L0] Phase 2 of Counter-Based Event Implementation

Open winstonzhang-intel opened this issue 1 year ago • 26 comments

-enable counter-based events for regular commandlist -counter-based events may be reused even though they are not done -when ref count goes to not used by external clients value it means that event may be reused by subsequent calls -move events that are no longer externally visible to re-usable pool and reuse those more aggressively

intel/llvm PR: https://github.com/intel/llvm/pull/14754

winstonzhang-intel avatar May 31 '24 23:05 winstonzhang-intel

This does not compile /w L0 adapter enabled. Also, feel free to add a relevant benchmark scenario to https://github.com/oneapi-src/unified-runtime/blob/main/.github/scripts/compute_benchmarks.py, or just run the existing benchmark with whatever env variables are needed. You can run these from: https://github.com/oneapi-src/unified-runtime/actions/workflows/benchmarks_compute.yml

You can reach out to me if you need help or advice.

pbalcer avatar Jun 06 '24 15:06 pbalcer

@pbalcer It should compile now, working out some of the e2e tests that are still failing.

winstonzhang-intel avatar Jun 10 '24 19:06 winstonzhang-intel

@winstonzhang-intel , please link the intel/llvm PR related to this issue so we can see the full e2e test results.

nrspruit avatar Jun 14 '24 14:06 nrspruit

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/9694638615

github-actions[bot] avatar Jun 27 '24 10:06 github-actions[bot]

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/9694638615 Job status: failure. Test status: skipped.

github-actions[bot] avatar Jun 27 '24 10:06 github-actions[bot]

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/9780598178

github-actions[bot] avatar Jul 03 '24 15:07 github-actions[bot]

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/9780598178 Job status: success. Test status: success.

Benchmark Results


---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl, mean execution time per 10 kernels (μs)
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF

        This PR (38.675 us)   : crit, 0, 38

        baseline (38.357 us)   :  0, 38

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>Imm-CmdLists-OFF

        This PR (36.082 us)   : crit, 0, 36

        baseline (36.972 us)   :  0, 36

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>

        This PR (40.549 us)   : crit, 0, 40

        baseline (41.505 us)   :  0, 41

    -   : 0, 0

    -   : 0, 0

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)<br>

        This PR (40.023 us)   : crit, 0, 40

        baseline (41.129 us)   :  0, 41

    -   : 0, 0

    -   : 0, 0

Details

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0 UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),38.675,38.403,4.91%,37.600,206.755,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0) Imm-CmdLists-OFF

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=0 UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),36.082,36.040,2.38%,35.332,112.299,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1 UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),40.549,40.484,2.12%,39.520,109.681,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_IMMEDIATE_COMMANDLISTS=1 UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/actions-runner/_work/unified-runtime/unified-runtime/compute-benchmarks-build/bin//api_overhead_benchmark_sycl --test=SubmitKernel --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=10000 --Profiling=0 --NumKernels=10 --KernelExecTime=1 --csv --noHeaders

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),40.023,39.999,2.41%,38.600,109.795,[CPU],[us]

github-actions[bot] avatar Jul 03 '24 15:07 github-actions[bot]

Compute Benchmarks level_zero run (with params: --compare baseline --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/10055105565

github-actions[bot] avatar Jul 23 '24 08:07 github-actions[bot]

Compute Benchmarks level_zero run (--compare baseline --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/10055105565 Job status: failure. Test status: failure.

github-actions[bot] avatar Jul 23 '24 08:07 github-actions[bot]

easyWave_sycl -grid examples/e2Asean.grd -source examples/BengkuluSept2007.flt -time 120

easyWave_sycl benchmark hanged with this PR.

pbalcer avatar Jul 23 '24 08:07 pbalcer

easyWave_sycl -grid examples/e2Asean.grd -source examples/BengkuluSept2007.flt -time 120

easyWave_sycl benchmark hanged with this PR.

@pbalcer , how can one get this benchmark and run locally? That way @winstonzhang-intel can investigate the issue locally.

nrspruit avatar Jul 23 '24 23:07 nrspruit

@pbalcer getting different results on llvm/sycl test-e2e. Also confirmed locally on a PVC machine. The following tests were passing on my machine:

  • SYCL :: DiscardEvents/discard_events_mixed_calls.cpp
  • SYCL :: ESIMD/BitonicSortKv2.cpp
  • SYCL :: ESIMD/kmeans/kmeans.cpp
  • SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp
  • SYCL :: Graph/RecordReplay/dotp_in_order.cpp
  • SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp
  • SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp
  • SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp
  • SYCL :: Graph/RecordReplay/host_task_in_order.cpp
  • SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp
  • SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp

An example output of one of the tests: $ LD_LIBRARY_PATH=/iusers/winstonz/lib/driver/:/iusers/winstonz/llvm/build/lib:$LD_LIBRARY_PATH UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 ./build/bin/llvm-lit -vv sycl/test-e2e/Graph/RecordReplay/usm_copy_in_order.cpp llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:414: note: Targeted devices: all llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:635: warning: Couldn't find pre-installed AOT device compiler ocloc llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:635: warning: Couldn't find pre-installed AOT device compiler opencl-aot llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:733: note: Aspects for level_zero:gpu: ext_oneapi_fixed_size_group, gpu, queue_profiling, ext_oneapi_bindless_images_shared_usm, ext_intel_device_id, usm_atomic_shared_allocations, ext_intel_gpu_subslices_per_slice, ext_oneapi_private_alloca, ext_intel_gpu_eu_simd_width, usm_device_allocations, ext_oneapi_bindless_images_2d_usm, ext_oneapi_graph, ext_oneapi_queue_profiling_tag, ext_oneapi_bindless_images, fp16, ext_intel_gpu_hw_threads_per_eu, online_linker, ext_oneapi_tangle_group, online_compiler, usm_host_allocations, ext_intel_memory_bus_width, ext_intel_gpu_eu_count_per_subslice, fp64, ext_intel_memory_clock_rate, ext_intel_gpu_eu_count, ext_oneapi_mipmap_anisotropy, ext_intel_device_info_uuid, ext_intel_matrix, ext_oneapi_opportunistic_group, ext_intel_pci_address, ext_oneapi_mipmap, ext_oneapi_ballot_group, ext_intel_esimd, atomic64, usm_shared_allocations, ext_oneapi_virtual_mem, ext_intel_gpu_slices, ext_oneapi_limited_graph llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:745: note: SG sizes for level_zero:gpu: 16, 32 llvm-lit: /localdisk2/winstonz/llvm/sycl/test-e2e/lit.cfg.py:754: note: Architectures for level_zero:gpu: intel_gpu_pvc -- Testing: 1 tests, 1 workers -- PASS: SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp (1 of 1)

Testing Time: 77.00s

Total Discovered Tests: 1 Passed: 1 (100.00%)

2 warning(s) in tests

winstonzhang-intel avatar Jul 24 '24 03:07 winstonzhang-intel

https://github.com/oneapi-src/Velocity-Bench/tree/main/easywave

You can also use our automation scripts: https://github.com/oneapi-src/unified-runtime/tree/main/scripts/benchmarks

There's no way to select a single benchmark, yet, but for now you can comment out all the benchmarks but easywave: https://github.com/oneapi-src/unified-runtime/blob/main/scripts/benchmarks/main.py#L40

As for the failing E2E tests, please create a PR on intel/llvm if you feel the fails in UR CI are incorrect.

pbalcer avatar Jul 24 '24 09:07 pbalcer

lgtm once all tests are green and the benchmarks are passing.

Just curious, why not base this PR on #1600?

1600 still have some tests that are not passing so I didn't rebase against that. Here's the CI on llvm/sycl that is all passing: https://github.com/intel/llvm/pull/14754 ^None of the tests that URT CI claims to be failing are failing on llvm/sycl CI

winstonzhang-intel avatar Jul 24 '24 22:07 winstonzhang-intel

^None of the tests that URT CI claims to be failing are failing on llvm/sycl CI

They don't have a PVC system in CI. Other PRs (see this PR) do not exhibit the same failures as this one (ignoring the address sanitizer problem that popped up yesterday). These failures seem to be unique for this PR:

  SYCL :: DiscardEvents/discard_events_mixed_calls.cpp
  SYCL :: ESIMD/BitonicSortKv2.cpp
  SYCL :: ESIMD/kmeans/kmeans.cpp
  SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp
  SYCL :: Graph/RecordReplay/dotp_in_order.cpp
  SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp
  SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp
  SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp
  SYCL :: Graph/RecordReplay/host_task_in_order.cpp
  SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp
  SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp

pbalcer avatar Jul 25 '24 06:07 pbalcer

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/10094246782

github-actions[bot] avatar Jul 25 '24 12:07 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/10094246782 Job status: failure. Test status: failure.

github-actions[bot] avatar Jul 25 '24 12:07 github-actions[bot]

^None of the tests that URT CI claims to be failing are failing on llvm/sycl CI

They don't have a PVC system in CI. Other PRs (see this PR) do not exhibit the same failures as this one (ignoring the address sanitizer problem that popped up yesterday). These failures seem to be unique for this PR:

  SYCL :: DiscardEvents/discard_events_mixed_calls.cpp
  SYCL :: ESIMD/BitonicSortKv2.cpp
  SYCL :: ESIMD/kmeans/kmeans.cpp
  SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp
  SYCL :: Graph/RecordReplay/dotp_in_order.cpp
  SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp
  SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp
  SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp
  SYCL :: Graph/RecordReplay/host_task_in_order.cpp
  SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp
  SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp

I've tried at least 5 PVC machines now and none of them seems to be able to reproduce these failures.

winstonzhang-intel avatar Jul 26 '24 22:07 winstonzhang-intel

^None of the tests that URT CI claims to be failing are failing on llvm/sycl CI

They don't have a PVC system in CI. Other PRs (see this PR) do not exhibit the same failures as this one (ignoring the address sanitizer problem that popped up yesterday). These failures seem to be unique for this PR:

  SYCL :: DiscardEvents/discard_events_mixed_calls.cpp
  SYCL :: ESIMD/BitonicSortKv2.cpp
  SYCL :: ESIMD/kmeans/kmeans.cpp
  SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp
  SYCL :: Graph/RecordReplay/dotp_in_order.cpp
  SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp
  SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp
  SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp
  SYCL :: Graph/RecordReplay/host_task_in_order.cpp
  SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp
  SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp

I've tried at least 5 PVC machines now and none of them seems to be able to reproduce these failures.

@winstonzhang-intel , PVC runs immediate command lists by default, this functionality is for regular command lists so you need to test on GEN12, DG2, or Flex gpu.

nrspruit avatar Jul 26 '24 22:07 nrspruit

@pbalcer Seems like the e2e L0 tests are getting stuck. Could you please check that? I've also tried to run the the e2e tests locally, and they all seem to be passing. This is running on gen12 and regular commandlist should be in use: `$ bash ./test.sh llvm-lit: /home/scss_dev/workspace/llvm/sycl/test-e2e/lit.cfg.py:769: note: Architectures for opencl:gpu: intel_gpu_adl_s -- Testing: 11 tests, 11 workers -- PASS: SYCL :: Graph/RecordReplay/dotp_in_order.cpp (1 of 11) PASS: SYCL :: Graph/RecordReplay/usm_copy_in_order.cpp (2 of 11) PASS: SYCL :: Graph/RecordReplay/dotp_multiple_queues.cpp (3 of 11) PASS: SYCL :: Graph/RecordReplay/dotp_in_order_with_empty_nodes.cpp (4 of 11) PASS: SYCL :: Graph/RecordReplay/host_task_in_order.cpp (5 of 11) PASS: SYCL :: Graph/RecordReplay/dotp_in_order_pause.cpp (6 of 11) PASS: SYCL :: Graph/RecordReplay/sub_graph_in_order.cpp (7 of 11) PASS: SYCL :: Graph/RecordReplay/barrier_multi_queue.cpp (8 of 11) PASS: SYCL :: DiscardEvents/discard_events_mixed_calls.cpp (9 of 11) PASS: SYCL :: ESIMD/BitonicSortKv2.cpp (10 of 11) PASS: SYCL :: ESIMD/kmeans/kmeans.cpp (11 of 11)

Testing Time: 21.48s

Total Discovered Tests: 11 Passed: 11 (100.00%)`

winstonzhang-intel avatar Jul 30 '24 23:07 winstonzhang-intel

@pbalcer Seems like the e2e L0 tests are getting stuck.

The system we used in CI died and we haven't managed to get it back up yet.

Thanks for checking that the e2e tests are now passing. I'm not sure what was wrong with the runs in the CI (maybe a stale commit?).

pbalcer avatar Jul 31 '24 06:07 pbalcer

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/10195419517

github-actions[bot] avatar Aug 01 '24 09:08 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/10195419517 Job status: failure. Test status: failure.

github-actions[bot] avatar Aug 01 '24 09:08 github-actions[bot]

CudaSift benchmark has failed:

Image size = (1920,1080)
Initializing data...
Number of original features: 3683 3933
Number of matching features: 87 3142 2.3622% 1 2

Performing data verification 
Data verification FAILED. 

This is on 1T PVC.

You can run the same benchmark by using the scripts here: $ ./main.py ~/benchmarks_workdir/ ~/llvm/build/ --filter CudaSift --iterations 1

Where benchmarks_workdir is a location where the benchmarks will be built and ~/llvm/build/ is a location of the compiler that was built with the desired UR version. See $ ./main.py --help for more options

pbalcer avatar Aug 01 '24 09:08 pbalcer

Compute Benchmarks level_zero run (with params: ): https://github.com/oneapi-src/unified-runtime/actions/runs/10305771352

github-actions[bot] avatar Aug 08 '24 16:08 github-actions[bot]

Compute Benchmarks level_zero run (): https://github.com/oneapi-src/unified-runtime/actions/runs/10305771352 Job status: failure. Test status: failure.

github-actions[bot] avatar Aug 08 '24 16:08 github-actions[bot]

Compute Benchmarks level_zero run (with params: --env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/10880913609

github-actions[bot] avatar Sep 16 '24 09:09 github-actions[bot]

Compute Benchmarks level_zero run (--env UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 --env UR_L0_USE_DRIVER_INORDER_LISTS=1): https://github.com/oneapi-src/unified-runtime/actions/runs/10880913609 Job status: success. Test status: success.

Summary

result is better

Benchmark This PR baseline
api_overhead_benchmark_sycl SubmitKernel out of order 48.362 50.631
api_overhead_benchmark_sycl SubmitKernel in order 47.024 49.385
api_overhead_benchmark_ur SubmitKernel out of order 31.312 31.93
api_overhead_benchmark_ur SubmitKernel in order 25.546 28.586
memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024 424.685 423.457
memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024 261.384 253.906
memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024 10.089 9.179
memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240 3.002 1.854
api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024 2.143 4.506
api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024 2.096 3.613
miscellaneous_benchmark_sycl VectorSum 858.416 863.651
Velocity-Bench Hashtable 207.852567 178.291413
Velocity-Bench Bitcracker 35.6076 35.8407
Velocity-Bench CudaSift 256.843 283.294
Velocity-Bench Easywave 446 457.0
Velocity-Bench QuickSilver 90.08 115.63
Velocity-Bench Sobel Filter 985.857 934.963

Charts

api_overhead_benchmark_sycl SubmitKernel out of order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel out of order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (48.362 μs)   : crit, 0, 48

        baseline (50.631 μs)   :  0, 50

    -   : 0, 0

    -   : 0, 0

api_overhead_benchmark_sycl SubmitKernel in order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl SubmitKernel in order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=sycl<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (47.024 μs)   : crit, 0, 47

        baseline (49.385 μs)   :  0, 49

    -   : 0, 0

    -   : 0, 0

api_overhead_benchmark_ur SubmitKernel out of order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_ur SubmitKernel out of order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=0<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (31.312 μs)   : crit, 0, 31

        baseline (31.93 μs)   :  0, 31

    -   : 0, 0

    -   : 0, 0

api_overhead_benchmark_ur SubmitKernel in order
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_ur SubmitKernel in order
    todayMarker off
    dateFormat  X
    axisFormat %s

    section SubmitKernel(api=ur<br>Profiling=0<br>Ioq=1<br>DiscardEvents=0<br>NumKernels=10<br>KernelExecTime=1<br>MeasureCompletion=0)

        This PR (25.546 μs)   : crit, 0, 25

        baseline (28.586 μs)   :  0, 28

    -   : 0, 0

    -   : 0, 0

memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (424.685 μs)   : crit, 0, 424

        baseline (423.457 μs)   :  0, 423

    -   : 0, 0

    -   : 0, 0

memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueInOrderMemcpy from Host to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueInOrderMemcpy(api=sycl<br>IsCopyOnly=0<br>sourcePlacement=Host<br>destinationPlacement=Device<br>size=1KB<br>count=100)

        This PR (261.384 μs)   : crit, 0, 261

        baseline (253.906 μs)   :  0, 253

    -   : 0, 0

    -   : 0, 0

memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl QueueMemcpy from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QueueMemcpy(api=sycl<br>sourcePlacement=Device<br>destinationPlacement=Device<br>size=1KB)

        This PR (10.089 μs)   : crit, 0, 10

        baseline (9.179 μs)   :  0, 9

    -   : 0, 0

    -   : 0, 0

memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title memory_benchmark_sycl StreamMemory, placement Device, type Triad, size 10240
    todayMarker off
    dateFormat  X
    axisFormat %s

    section StreamMemory(api=sycl<br>type=Triad<br>size=10KB<br>useEvents=0<br>contents=Zeros<br>memoryPlacement=Device)

        This PR (3.002 μs)   : crit, 0, 3

        baseline (1.854 μs)   :  0, 1

    -   : 0, 0

    -   : 0, 0

api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue out of order from Device to Device, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Device<br>dst=Device<br>size=1KB<br>ioq=0)

        This PR (2.143 μs)   : crit, 0, 2

        baseline (4.506 μs)   :  0, 4

    -   : 0, 0

    -   : 0, 0

api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title api_overhead_benchmark_sycl ExecImmediateCopyQueue in order from Device to Host, size 1024
    todayMarker off
    dateFormat  X
    axisFormat %s

    section ExecImmediateCopyQueue(api=sycl<br>IsCopyOnly=1<br>MeasureCompletionTime=0<br>src=Host<br>dst=Host<br>size=1KB<br>ioq=1)

        This PR (2.096 μs)   : crit, 0, 2

        baseline (3.613 μs)   :  0, 3

    -   : 0, 0

    -   : 0, 0

miscellaneous_benchmark_sycl VectorSum
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title miscellaneous_benchmark_sycl VectorSum
    todayMarker off
    dateFormat  X
    axisFormat %s

    section VectorSum(api=sycl<br>numberOfElementsX=512<br>numberOfElementsY=256<br>numberOfElementsZ=256)

        This PR (858.416 μs)   : crit, 0, 858

        baseline (863.651 μs)   :  0, 863

    -   : 0, 0

    -   : 0, 0

Velocity-Bench Hashtable
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Hashtable
    todayMarker off
    dateFormat  X
    axisFormat %s

    section hashtable

        This PR (207.852567 M keys/sec)   : crit, 0, 207

        baseline (178.291413 M keys/sec)   :  0, 178

    -   : 0, 0

    -   : 0, 0

Velocity-Bench Bitcracker
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Bitcracker
    todayMarker off
    dateFormat  X
    axisFormat %s

    section bitcracker

        This PR (35.6076 s)   : crit, 0, 35

        baseline (35.8407 s)   :  0, 35

    -   : 0, 0

    -   : 0, 0

Velocity-Bench CudaSift
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench CudaSift
    todayMarker off
    dateFormat  X
    axisFormat %s

    section cudaSift

        This PR (256.843 ms)   : crit, 0, 256

        baseline (283.294 ms)   :  0, 283

    -   : 0, 0

    -   : 0, 0

Velocity-Bench Easywave
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Easywave
    todayMarker off
    dateFormat  X
    axisFormat %s

    section easywave

        This PR (446 ms)   : crit, 0, 446

        baseline (457.0 ms)   :  0, 457

    -   : 0, 0

    -   : 0, 0

Velocity-Bench QuickSilver
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench QuickSilver
    todayMarker off
    dateFormat  X
    axisFormat %s

    section QuickSilver

        This PR (90.08 MMS/CTT)   : crit, 0, 90

        baseline (115.63 MMS/CTT)   :  0, 115

    -   : 0, 0

    -   : 0, 0

Velocity-Bench Sobel Filter
---
config:
    gantt:
        rightPadding: 10
        leftPadding: 120
        sectionFontSize: 10
        numberSectionStyles: 2
---
gantt
    title Velocity-Bench Sobel Filter
    todayMarker off
    dateFormat  X
    axisFormat %s

    section sobel_filter

        This PR (985.857 ms)   : crit, 0, 985

        baseline (934.963 ms)   :  0, 934

    -   : 0, 0

    -   : 0, 0

Details

SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),48.362,47.646,7.34%,43.188,547.322,[CPU],[us]

SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=sycl Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),47.024,46.508,6.65%,44.278,209.617,[CPU],[us]

SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=0 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=0 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),31.312,31.050,6.53%,29.597,503.558,[CPU],[us]

SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_ur --test=SubmitKernel --csv --noHeaders --Ioq=1 --DiscardEvents=0 --MeasureCompletion=0 --iterations=100000 --Profiling=0 --NumKernels=10 --KernelExecTime=1

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type SubmitKernel(api=ur Profiling=0 Ioq=1 DiscardEvents=0 NumKernels=10 KernelExecTime=1 MeasureCompletion=0),25.546,29.884,27.77%,13.324,230.644,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Device --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Device destinationPlacement=Device size=1KB count=100),424.685,467.871,19.83%,246.890,870.042,[CPU],[us]

QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueInOrderMemcpy --csv --noHeaders --iterations=10000 --IsCopyOnly=0 --sourcePlacement=Host --destinationPlacement=Device --size=1024 --count=100

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type QueueInOrderMemcpy(api=sycl IsCopyOnly=0 sourcePlacement=Host destinationPlacement=Device size=1KB count=100),261.384,238.517,22.09%,230.359,746.004,[CPU],[us]

QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=QueueMemcpy --csv --noHeaders --iterations=10000 --sourcePlacement=Device --destinationPlacement=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type QueueMemcpy(api=sycl sourcePlacement=Device destinationPlacement=Device size=1KB),10.089,9.944,18.73%,7.751,150.687,[CPU],[us]

StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/memory_benchmark_sycl --test=StreamMemory --csv --noHeaders --iterations=10000 --type=Triad --size=10240 --memoryPlacement=Device --useEvents=0 --contents=Zeros

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type StreamMemory(api=sycl type=Triad size=10KB useEvents=0 contents=Zeros memoryPlacement=Device),3.002,3.081,6.77%,0.382,3.365,[CPU],[GB/s]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=0 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Device --dst=Device --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Device dst=Device size=1KB ioq=0),2.143,2.101,14.10%,1.894,75.835,[CPU],[us]

ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/api_overhead_benchmark_sycl --test=ExecImmediateCopyQueue --csv --noHeaders --iterations=100000 --ioq=1 --IsCopyOnly=1 --MeasureCompletionTime=0 --src=Host --dst=Host --size=1024

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type ExecImmediateCopyQueue(api=sycl IsCopyOnly=1 MeasureCompletionTime=0 src=Host dst=Host size=1KB ioq=1),2.096,1.670,45.10%,1.554,28.530,[CPU],[us]

VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256)

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/compute-benchmarks-build/bin/miscellaneous_benchmark_sycl --test=VectorSum --csv --noHeaders --iterations=1000 --numberOfElementsX=512 --numberOfElementsY=256 --numberOfElementsZ=256

Output:

TestCase,Mean,Median,StdDev,Min,Max,Type VectorSum(api=sycl numberOfElementsX=512 numberOfElementsY=256 numberOfElementsZ=256),858.416,858.902,0.49%,821.607,879.002,[GPU],bw [GB/s]

hashtable

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/hashtable/hashtable_sycl --no-verify

Output:

hashtable - total time for whole calculation: 0.645735 s 207.852567 million keys/second

bitcracker

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/bitcracker/bitcracker -f /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt -d /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt -b 60000

Output:

---------> BitCracker: BitLocker password cracking tool <---------

================================== Retrieving Info

Reading hash file "/home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/img_win8_user_hash.txt"

              Attack

================================================ Type of attack: User Password Psw per thread: 1 max_num_pswd_per_read: 60000 Dictionary: /home/test-user/bench_workdir/velocity-bench-repo/bitcracker/hash_pass/user_passwords_60000.txt MAC Comparison (-m): Yes

Iter: 1, num passwords read: 60000 Kernel execution: Effective passwords: 60000 Passwords Range: npknpByH7N2m3OnLNH1X9DJxLrzIFWk ..... dL_7uuf3QCz-c6K3xDu0

================================================ Bitcracker attack completed Total passwords evaluated: 60000 Password not found!

time to subtract from total: 0.0101897 s bitcracker - total time for whole calculation: 35.6076 s

cudaSift

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/cudaSift/cudaSift

Output:

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1185 1247 32.1749% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1221 1256 33.1523% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1138 1277 30.8987% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1217 1253 33.0437% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1267 33.451% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1140 1265 30.953% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1262 33.451% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1265 33.3695% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1228 1263 33.3424% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1237 1270 33.5868% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1104 1259 29.9756% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1155 1257 31.3603% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1236 1273 33.5596% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1121 1258 30.4371% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1241 1274 33.6954% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1236 1268 33.5596% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1163 1253 31.5775% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1249 1284 33.9126% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1223 1256 33.2066% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1070 1268 29.0524% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1238 1273 33.6139% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1221 1254 33.1523% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1231 1268 33.4238% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1159 1261 31.4689% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1265 33.3967% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1227 1261 33.3152% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1265 33.451% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1220 1260 33.1252% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1230 1267 33.3967% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1221 1256 33.1523% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1236 1275 33.5596% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1222 1255 33.1795% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1099 1259 29.8398% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1231 1267 33.4238% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1229 1264 33.3695% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1108 1276 30.0842% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1110 1249 30.1385% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1263 33.2881% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1205 1271 32.7179% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1131 1264 30.7087% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1266 33.4781% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1134 1274 30.7901% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1238 1270 33.6139% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1131 1263 30.7087% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1219 1250 33.098% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1226 1257 33.2881% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1232 1267 33.451% 1 2

Performing data verification Data verification is SUCCESSFUL.

Image size = (1920,1080) Initializing data... Number of original features: 3683 3933 Number of matching features: 1233 1272 33.4781% 1 2

Performing data verification Data verification is SUCCESSFUL.

Avg workload time = 256.843 ms

easywave

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1

Command:

/home/test-user/bench_workdir/easywave/easyWave_sycl -grid /home/test-user/bench_workdir/data/easywave/examples/e2Asean.grd -source /home/test-user/bench_workdir/data/easywave/examples/BengkuluSept2007.flt -time 120

Output:

MAIN: Starting SYCL main program MAIN: Attempting to clean up previous eWave tsunami files MAIN: Clean up completed SYCL: SYCL Queue initialization successful SYCL: Using SYCL device : Intel(R) Data Center GPU Max 1100 (Driver version 1.3.29735+27) SYCL: Platform : Intel(R) oneAPI Unified Runtime over Level-Zero MAIN: Program successfully completed

QuickSilver

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1 QS_DEVICE=GPU

Command:

/home/test-user/bench_workdir/QuickSilver/qs -i /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp

Output:

Copyright (c) 2016 Lawrence Livermore National Security, LLC All Rights Reserved Quicksilver Version : Quicksilver Git Hash : MPI Version : 3.0 Number of MPI ranks : 1 Number of OpenMP Threads: 1 Number of OpenMP CPUs : 1

Loading params Finished loading params Simulation: dt: 1e-08 fMax: 0.1 inputFile: /home/test-user/bench_workdir/velocity-bench-repo/QuickSilver/Examples/AllScattering/scatteringOnly.inp energySpectrum: boundaryCondition: octant loadBalance: 1 cycleTimers: 0 debugThreads: 0 lx: 100 ly: 100 lz: 100 nParticles: 10000000 batchSize: 0 nBatches: 10 nSteps: 10 nx: 10 ny: 10 nz: 10 seed: 1029384756 xDom: 0 yDom: 0 zDom: 0 eMax: 20 eMin: 1e-09 nGroups: 230 lowWeightCutoff: 0.001 bTally: 1 fTally: 1 cTally: 1 coralBenchmark: 0 crossSectionsOut:

Geometry: material: sourceMaterial shape: brick xMax: 100 xMin: 0 yMax: 100 yMin: 0 zMax: 100 zMin: 0

Material: name: sourceMaterial mass: 1000 nIsotopes: 10 nReactions: 9 sourceRate: 1e+10 totalCrossSection: 0.1 absorptionCrossSection: flat fissionCrossSection: flat scatteringCrossSection: flat absorptionCrossSectionRatio: 0 fissionCrossSectionRatio: 0 scatteringCrossSectionRatio: 1

CrossSection: name: flat A: 0 B: 0 C: 0 D: 0 E: 1 nuBar: 2.4 setting GPU setting parameters Building partition 0 Building partition 1 Building partition 2 Building partition 3 Building MC_Domain 0 Building MC_Domain 1 Building MC_Domain 2 Building MC_Domain 3 Starting Consistency Check Finished Consistency Check Finished initMesh Started copyMaterialDatabase_device Finished copyMaterialDatabase_device Finished copyNuclearData_device Finished copyDomainDevice cycle start source rr split absorb scatter fission produce collisn escape census num_seg scalar_flux cycleInit cycleTracking cycleFinalize 0 0 1000000 0 9000000 0 18533189 0 0 18533189 1151780 8848220 55527935 1.854923e+09 4.411710e-01 8.249170e-01 0.000000e+00 1 8848220 1000000 0 151478 0 34281997 0 0 34281997 1664159 8335539 94633679 5.047651e+09 3.726020e-01 9.738420e-01 0.000000e+00 2 8335539 1000000 0 663717 0 34354432 0 0 34354432 1366771 8632485 95010375 7.705930e+09 5.810500e-01 1.006878e+00 0.000000e+00 3 8632485 1000000 0 367978 0 34302727 0 0 34302727 1242216 8758247 94953591 9.992076e+09 6.018250e-01 1.105585e+00 0.000000e+00 4 8758247 1000000 0 242076 0 34141236 0 0 34141236 1168452 8831871 94599337 1.199834e+10 5.608290e-01 1.040724e+00 0.000000e+00 5 8831871 1000000 0 168070 0 33948724 0 0 33948724 1121156 8878785 94148236 1.377636e+10 3.749500e-01 9.924050e-01 0.000000e+00 6 8878785 1000000 0 120572 0 33760567 0 0 33760567 1089103 8910254 93689264 1.535668e+10 5.696560e-01 1.000601e+00 0.000000e+00 7 8910254 1000000 0 89810 0 33552179 0 0 33552179 1065203 8934861 93216931 1.676993e+10 5.518340e-01 1.028976e+00 0.000000e+00 8 8934861 1000000 0 65491 0 33384605 0 0 33384605 1047720 8952632 92768273 1.804559e+10 5.396320e-01 1.035437e+00 0.000000e+00 9 8952632 1000000 0 47165 0 33198494 0 0 33198494 1033968 8965829 92324678 1.920208e+10 5.596030e-01 9.911010e-01 0.000000e+00

Timer Cumulative Cumulative Cumulative Cumulative Cumulative Cumulative Name number microSecs microSecs microSecs microSecs Efficiency of calls min avg max stddev Rating main 1 1.516e+07 1.516e+07 1.516e+07 0.000e+00 100.00 cycleInit 10 5.153e+06 5.153e+06 5.153e+06 0.000e+00 100.00 cycleTracking 10 1.000e+07 1.000e+07 1.000e+07 0.000e+00 100.00 cycleTracking_Kernel 104 4.942e+06 4.942e+06 4.942e+06 0.000e+00 100.00 cycleTracking_MPI 117 2.556e+05 2.556e+05 2.556e+05 0.000e+00 100.00 cycleTracking_Test_Done 0 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.00 cycleFinalize 20 7.140e+02 7.140e+02 7.140e+02 0.000e+00 100.00 Figure Of Merit 90.08 [Num Mega Segments / Cycle Tracking Time]

sobel_filter

Environment Variables:

UR_L0_USE_DRIVER_COUNTER_BASED_EVENTS=1 UR_L0_USE_DRIVER_INORDER_LISTS=1 OPENCV_IO_MAX_IMAGE_PIXELS=1677721600

Command:

/home/test-user/bench_workdir/sobel_filter/sobel_filter -i /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png -n 5

Output:

SYMN: Welcome to the SYCL version of Sobel filter workload. SYMN: Input image file: /home/test-user/bench_workdir/data/sobel_filter/sobel_filter_data/silverfalls_32Kx32K.png SYMN: Launching SYCL kernel with # of iterations: 5 time to subtract from total: 14.9964 s sobelfilter - total time for whole calculation: 0.985857 s

github-actions[bot] avatar Sep 16 '24 09:09 github-actions[bot]