zlib icon indicating copy to clipboard operation
zlib copied to clipboard

Add optimized slide_hash for Power processors

Open mscastanho opened this issue 5 years ago • 1 comments

Hi,

During performance tests, we noticed that slide_hash consumes considerable CPU during compression on Power processors. This PR introduces an optimized version using VSX vector instructions to make it faster. The main difference is that it slides 8 elements at a time, instead of just one as the standard code does.

The implementation uses GNU indirect function (ifunc) feature to choose the correct function version to be used on the first call during runtime. Later calls will all go directly to the selected function. This way, the same binary can be used for all Power processor versions. The ifunc helper code, however, is not limited to Power, and can be reused by other archs if wanted, so it was placed under contrib/gcc.

I tried to make as few changes as possible to top-level files (deflate.c), and instead place most Power-specific code under contrib/power.

To measure the performance improvement, we used Chromium's zlib_bench.cc with input files from jsnell/zlib-bench.

The results below show compression throughput in MB/s using RAW deflate, for all compression levels:

  • jpeg

    comp lvl default optimized gain
    1 20.4 27.4 +34.31%
    2 20.2 26.4 +30.69%
    3 20.2 27.1 +34.16%
    4 20.3 27.3 +34.48%
    5 20.3 27.3 +34.48%
    6 20.3 27.3 +34.48%
    7 20.3 27.3 +34.48%
    8 20.3 27.3 +34.48%
    9 20.3 27.3 +34.48%
  • pngpixels

    comp lvl default optimized gain
    1 67.0 98.6 +47.16%
    2 58.7 79.8 +35.95%
    3 38.8 46.7 +20.36%
    4 42.1 48.8 +15.91%
    5 26.6 29.2 +9.77%
    6 13.8 14.5 +5.07%
    7 8.9 9.2 +3.37%
    8 2.8 2.8 +0.00%
    9 1.3 1.3 +0.00%
  • executable

    comp lvl default optimized gain
    1 41.3 57.6 +39.47%
    2 37.9 50.9 +34.30%
    3 29.0 36.1 +24.48%
    4 28.4 34.8 +22.54%
    5 20.2 23.2 +14.85%
    6 12.5 13.7 +9.60%
    7 9.5 10.1 +6.32%
    8 5.4 5.6 +3.70%
    9 4.1 4.2 +2.44%
  • html

    comp lvl default optimized gain
    1 43.1 59.3 +37.59%
    2 38.6 50.7 +31.35%
    3 27.8 33.8 +21.58%
    4 28.3 33.1 +16.96%
    5 18.1 20.1 +11.05%
    6 12.2 13.0 +6.56%
    7 10.6 11.2 +5.66%
    8 8.0 8.4 +5.00%
    9 7.9 8.3 +5.06%

mscastanho avatar Dec 10 '19 14:12 mscastanho

Force push to add changes to feature detection on configure.

mscastanho avatar Mar 10 '20 20:03 mscastanho