zlib Add optimized slide_hash for Power processors

Add optimized slide_hash for Power processors

Open mscastanho opened this issue 5 years ago • 1 comments

Hi,

During performance tests, we noticed that slide_hash consumes considerable CPU during compression on Power processors. This PR introduces an optimized version using VSX vector instructions to make it faster. The main difference is that it slides 8 elements at a time, instead of just one as the standard code does.

The implementation uses GNU indirect function (ifunc) feature to choose the correct function version to be used on the first call during runtime. Later calls will all go directly to the selected function. This way, the same binary can be used for all Power processor versions. The ifunc helper code, however, is not limited to Power, and can be reused by other archs if wanted, so it was placed under contrib/gcc.

I tried to make as few changes as possible to top-level files (deflate.c), and instead place most Power-specific code under contrib/power.

To measure the performance improvement, we used Chromium's zlib_bench.cc with input files from jsnell/zlib-bench.

The results below show compression throughput in MB/s using RAW deflate, for all compression levels:

jpeg

comp lvl	default	optimized	gain
1	20.4	27.4	+34.31%
2	20.2	26.4	+30.69%
3	20.2	27.1	+34.16%
4	20.3	27.3	+34.48%
5	20.3	27.3	+34.48%
6	20.3	27.3	+34.48%
7	20.3	27.3	+34.48%
8	20.3	27.3	+34.48%
9	20.3	27.3	+34.48%

pngpixels

comp lvl	default	optimized	gain
1	67.0	98.6	+47.16%
2	58.7	79.8	+35.95%
3	38.8	46.7	+20.36%
4	42.1	48.8	+15.91%
5	26.6	29.2	+9.77%
6	13.8	14.5	+5.07%
7	8.9	9.2	+3.37%
8	2.8	2.8	+0.00%
9	1.3	1.3	+0.00%

executable

comp lvl	default	optimized	gain
1	41.3	57.6	+39.47%
2	37.9	50.9	+34.30%
3	29.0	36.1	+24.48%
4	28.4	34.8	+22.54%
5	20.2	23.2	+14.85%
6	12.5	13.7	+9.60%
7	9.5	10.1	+6.32%
8	5.4	5.6	+3.70%
9	4.1	4.2	+2.44%

html

comp lvl	default	optimized	gain
1	43.1	59.3	+37.59%
2	38.6	50.7	+31.35%
3	27.8	33.8	+21.58%
4	28.3	33.1	+16.96%
5	18.1	20.1	+11.05%
6	12.2	13.0	+6.56%
7	10.6	11.2	+5.66%
8	8.0	8.4	+5.00%
9	7.9	8.3	+5.06%

Dec 10 '19 14:12 mscastanho

Force push to add changes to feature detection on configure.

Mar 10 '20 20:03 mscastanho

zlib zlib copied to clipboard

Add optimized slide_hash for Power processors

zlib
zlib copied to clipboard