openvino [CPU] Enable cpu_convert utility function with AVX

Details:

Augmented cpu_convert function by adding AVX512 fp16 load/store instruction support.
adding tests

Tickets:

Closes #21809

Jan 10 '24 18:01 siddhant-0707

Hey @ceciliapeng2011 could you guide me a little bit on where and how to create the required tests for this? Please review my changes as well, thanks!

Jan 10 '24 18:01 siddhant-0707

Hey @ceciliapeng2011, please review, thanks!

Jan 18 '24 08:01 siddhant-0707

thanks for contribution @siddhant-0707 For unit test, maybe you could refer to this unit test - https://github.com/openvinotoolkit/openvino/pull/22174/files#diff-68058ac8f7dc6ca18caffd8e7ff762035dcd7ee15f5ccf4b78856a9a50639adf

Jan 19 '24 07:01 ceciliapeng2011

Hey, on my machine I am able to run the test (it uses avx2)

// Using AVX2
[ RUN      ] cpu_convert.AVX512_fp16_load_store
size 1000: 44 microseconds
size 10000: 4 microseconds
size 100000: 33 microseconds
size 1000000: 596 microseconds
[       OK ] cpu_convert.AVX512_fp16_load_store (5 ms)
[----------] 1 test from cpu_convert (5 ms total)

Will have to run on Intel Xeon to see avx512 performance. What is the CI configuration?

Jan 27 '24 09:01 siddhant-0707

This PR will be closed in a week because of 2 weeks of no activity.

Feb 12 '24 00:02 github-actions[bot]

This PR was closed because it has been stalled for 2 week with no activity.

Feb 19 '24 00:02 github-actions[bot]

CI probably cannot benchmark the performance across platforms. Do you have a local machine with AVX512?

Feb 22 '24 01:02 ceciliapeng2011

No, unfortunately I don't

Feb 22 '24 05:02 siddhant-0707

This PR will be closed in a week because of 2 weeks of no activity.

Mar 08 '24 00:03 github-actions[bot]

This PR was closed because it has been stalled for 2 week with no activity.

Mar 16 '24 00:03 github-actions[bot]

Hey @ceciliapeng2011 finally got a machine with AVX512 capability. Here are the results I collected after changing the lines you indicated to:

constexpr size_t vlen = 16u;
constexpr size_t vlen_log2 = 4;

[ RUN      ] cpu_convert.AVX512_fp16_load_store
size 1000: 645 microseconds
size 10000: 1055 microseconds
size 100000: 48 microseconds
size 1000000: 122 microseconds
size 10000000: 1398 microseconds
[       OK ] cpu_convert.AVX512_fp16_load_store (17 ms)


[ RUN      ] cpu_convert.AVX512_fp16_load_store
size 1000: 674 microseconds
size 10000: 1112 microseconds
size 100000: 41 microseconds
size 1000000: 143 microseconds
size 10000000: 1293 microseconds
[       OK ] cpu_convert.AVX512_fp16_load_store (17 ms)

Mar 19 '24 10:03 siddhant-0707

Hey @ceciliapeng2011 finally got a machine with AVX512 capability. Here are the results I collected after changing the lines you indicated to:

constexpr size_t vlen = 16u;
constexpr size_t vlen_log2 = 4;

[ RUN      ] cpu_convert.AVX512_fp16_load_store
size 1000: 645 microseconds
size 10000: 1055 microseconds
size 100000: 48 microseconds
size 1000000: 122 microseconds
size 10000000: 1398 microseconds
[       OK ] cpu_convert.AVX512_fp16_load_store (17 ms)


[ RUN      ] cpu_convert.AVX512_fp16_load_store
size 1000: 674 microseconds
size 10000: 1112 **microseconds**
size 100000: 41 microseconds
size 1000000: 143 microseconds
size 10000000: 1293 microseconds
[       OK ] cpu_convert.AVX512_fp16_load_store (17 ms)

Glad you have the AVX512 machine and continue the job! Great! So would you please benchmark the workload with both AVX2 and AVX512 on the same machine?

please make sure the scaling governors mode of your machine is performance (default is powersave) before benchmarking. You could set it with Linux command - echo "performance " | sudo tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Mar 24 '24 06:03 ceciliapeng2011

This PR will be closed in a week because of 2 weeks of no activity.

Apr 12 '24 00:04 github-actions[bot]

@siddhant-0707 From my perspective, this PR still needs the following two unit tests -

cross-compare the performance number of converting with avx2 and avx512 fp16 with different workloads;
validate the output result

Apr 19 '24 03:04 ceciliapeng2011

hey @siddhant-0707, will you have a time to finish this PR?

Jun 18 '24 08:06 mlukasze

openvino
openvino copied to clipboard

[CPU] Enable cpu_convert utility function with AVX_512 FP16

Details:

Tickets:

openvino openvino copied to clipboard

[CPU] Enable cpu_convert utility function with AVX_512 FP16

Details:

Tickets:

openvino
openvino copied to clipboard