hyperscan
hyperscan copied to clipboard
Using hsbench to test avx512 performance is even lower
My CPU is Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz; I built avx512 and sse3 instruction set hyperscan on my machine, and then used hsbench and officially downloaded data to test performance.
The commands I used are as follows:
for avx512:
cmake -DBUILD_AVX512=on -DCMAKE_C_FLAGS="-march=native" -DCMAKE_CXX_FLAGS="-march=native" -DFAT_RUNTIME=0 ..
make -j80
#run commond
taskset 1 hsbench -e pcre/snort_literals -c corpora/alexa200.db -V
for sse3 (On the same cpu):
cmake -DCMAKE_C_FLAGS="-march=core2" -DCMAKE_CXX_FLAGS="-march=core2" -DFAT_RUNTIME=0 ..
make -j80
#run commond
taskset 1 hsbench -e pcre/snort_literals -c corpora/alexa200.db -V
The gcc version i have is 7.3.0, and the operating system is ubuntu18.04.
sse3 runs nearly 10% faster than avx512. I don't know if this data is reasonable?
This is the result of my operation:
*** Snort literals against HTTP traffic, block mode.
Signatures: pcre/snort_literals
Hyperscan info: Version: 5.2.1 Features: AVX512 Mode: VECTORED
Expression count: 3,116
Bytecode size: 695,608 bytes
Database CRC: 0xe4f2719
Scratch size: 5,479 bytes
Compile time: 0.083 seconds
Peak heap usage: 192,765,952 bytes
Time spent scanning: 7.906 seconds
Corpus size: 177,087,567 bytes (130,957 blocks in 5,400 vectors)
Matches per iteration: 81,963 (0.474 matches/kilobyte)
Overall block rate: 331,268.29 blocks/sec
Mean throughput (overall): 3,583.68 Mbit/sec
Max throughput (per core): 3,767.96 Mbit/sec
*** Snort literals against HTTP traffic, block mode.
Signatures: pcre/snort_literals
Hyperscan info: Version: 5.2.1 Features: Mode: VECTORED
Expression count: 3,116
Bytecode size: 695,608 bytes
Database CRC: 0xe4f2719
Scratch size: 5,479 bytes
Compile time: 0.085 seconds
Peak heap usage: 193,003,520 bytes
Time spent scanning: 6.730 seconds
Corpus size: 177,087,567 bytes (130,957 blocks in 5,400 vectors)
Matches per iteration: 81,963 (0.474 matches/kilobyte)
Overall block rate: 389,196.47 blocks/sec
Mean throughput (overall): 4,210.35 Mbit/sec
Max throughput (per core): 4,438.81 Mbit/sec
Hi, your result under AVX512 shows nearly 15% performance drop against SSSE3, which seems too much to me. Actually this case only touched the large scale multi-literal matching part in Hyperscan, which now doesn't have any AVX2/AVX512 optimizations on it, as you can see, the bytecode sizes and CRCs are exactly the same, they're building and running same engines, with same runtime implementations, so same performances are expected. May AVX512 cause little performance drop due to frequency drop, but 15% is too much. I ran your commands on my server and saw AVX512 has 0.7% performance drop against SSSE3, which is reasonable to me, suggest you run the test again.
Hello, I retested, but it doesn't reach your performance difference of about 0.7%, and the final result is still around 15%.
My code is downloaded from the master branch, and the use case comes from data
Are we using the same source code and data set?
I believe we're using the same code, rule and corpus, because the bytecode CRC, corpus size and match rate are all the same, my result under AVX512 is as follows:
Signatures: ../signatures/HSBench/pcre/snort_literals Hyperscan info: Version: 5.2.1 Features: AVX512 Mode: VECTORED Expression count: 3,116 Bytecode size: 695,608 bytes Database CRC: 0xe4f2719 Scratch size: 5,479 bytes Compile time: 0.075 seconds Peak heap usage: 192,999,424 bytes
Time spent scanning: 5.645 seconds Corpus size: 177,087,567 bytes (130,957 blocks in 5,400 vectors) Matches per iteration: 81,963 (0.474 matches/kilobyte) Overall block rate: 463,951.77 blocks/sec Mean throughput (overall): 5,019.06 Mbit/sec Max throughput (per core): 5,148.03 Mbit/sec
At this moment I've no idea about how it could happen, we will have some further investigation then. 15% drop seems like overhead from assertions, but it shouldn't appear here.
I don‘t ’think it's a matter of assertion.
Because we did not add the debug option.
In addition, when I trace my program with gdb, the assert statement is not executed.
OK, so the problem should be elsewhere.
@fatchanghao I will be very grateful if you could synchronize the cpu, gcc and other related information you used for testing, I want to verify if it is the cause of other interference factors;
@fatchanghao I will be very grateful if you could synchronize the cpu, gcc and other related information you used for testing, I want to verify if it is the cause of other interference factors;
Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz, gcc 7.2.0 Ubuntu17.10 Hyperscan 5.2.1 building commands, rules, corpus are same as yours.
I found a Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, gcc 7.4.0, Ubuntu 18.04. Benchmark on this platform showed AVX512 has 1.8% performance drop.