zlib Add optimized longest_match for Power processors

Add optimized longest_match for Power processors

Open mscastanho opened this issue 5 years ago • 1 comments

Hello again,

This optimization uses VSX vector (SIMD) instructions to try to match multiple bytes at the same time during the search for the longest match. A vector load + comparison (16 bytes) has just a small overhead if compared to their regular versions, so the optimized longest_match tries to match as many bytes as possible on every comparison.

This PR shares 1 commit with #457 and #458, which can be removed if either one gets merged first. It also uses GNU indirect functions to choose which function version (optimized or default) to run on the first call to longest_match during runtime.

To test the performance improvement, we used Chromium's zlib_bench.cc with input files from jsnell/zlib-bench.

The results below show compression throughput in MB/s using RAW deflate, for all compression levels:

pngpixels

comp lvl	default	optimized	gain
1	67.5	73.0	+8.15%
2	59.0	65.3	+10.68%
3	38.8	45.2	+16.49%
4	42.0	46.0	+9.52%
5	26.7	31.6	+18.35%
6	13.8	16.5	+19.57%
7	8.9	10.6	+19.10%
8	2.8	3.4	+21.43%
9	1.3	1.5	+15.38%

jpeg

comp lvl	default	optimized	gain
1	20.0	20.5	+2.50%
2	20.2	20.3	+0.50%
3	20.2	20.3	+0.50%
4	20.3	20.4	+0.49%
5	20.3	20.4	+0.49%
6	20.3	20.4	+0.49%
7	20.3	20.4	+0.49%
8	19.9	20.4	+2.51%
9	20.3	20.4	+0.49%

executable

comp lvl	default	optimized	gain
1	41.2	43.1	+4.61%
2	37.8	39.2	+3.70%
3	28.9	29.9	+3.46%
4	28.3	28.9	+2.12%
5	20.2	21.4	+5.94%
6	12.5	13.1	+4.80%
7	9.5	9.9	+4.21%
8	5.4	5.6	+3.70%
9	4.1	4.2	+2.44%

html

comp lvl	default	optimized	gain
1	43.0	46.2	+7.44%
2	38.5	42.2	+9.61%
3	27.8	30.8	+10.79%
4	28.3	30.8	+8.83%
5	18.1	20.1	+11.05%
6	12.2	13.2	+8.20%
7	10.6	11.4	+7.55%
8	8.0	8.7	+8.75%
9	7.9	8.6	+8.86%

Dec 12 '19 13:12 mscastanho

Force push to add changes to feature detection on configure.

Mar 10 '20 20:03 mscastanho

zlib zlib copied to clipboard

Add optimized longest_match for Power processors

zlib
zlib copied to clipboard