python-snappy
python-snappy copied to clipboard
Performance improvements
- stream compressors: use Intel SSE4.2 CRC32C instruction when crc32c is available
- StreamDecompressor: faster decompression via a lot less memcpy
I am not able to review the changes here - anyone else around? @paulwouters , it may be enough to see that all tests pass, but would appreciate posting some benchmarks.
Do you know is snappy in cramjam can benefit from similar changes? (cc @milesgranger )
For us, the difference is that without these two patches, decompressing ~50mb takes 13sec, ~100mb takes 60sec (this is somewhat data-dependent).
With the patches applied over 0.6.0, decompressing that same 100mb takes 1sec
Wow! Such a difference doesn't entirely seem plausible :| Is this perhaps data where no real compression happened, i.e., what was the compression ratio?
Do you know is snappy in cramjam can benefit from similar changes? (cc @milesgranger )
I don't know, suppose the improvements would have to be in the upstream snappy crate.
However, I installed this branch from source and re-ran the benchmarks on my end w/ crc32c installed, and I get very similar results to existing benchmarks
Could be I didn't try hard enough, or the optimizations aren't triggered for my machine, but my system seems to align with what would be qualified for them supposedly. :man_shrugging:
Snappy raw:
-------------------------------------------------------------------------------------------------------------- benchmark: 28 tests ---------------------------------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-cramjam] 55.0391 (3.66) 133.2110 (2.34) 60.0523 (3.57) 8.4785 (2.79) 57.3170 (3.58) 3.0180 (4.68) 556;821 16,652.1528 (0.28) 5700 1
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-snappy] 52.7520 (3.51) 130.7940 (2.30) 57.7148 (3.43) 8.3499 (2.75) 55.0130 (3.43) 3.1963 (4.96) 930;1470 17,326.5634 (0.29) 9549 1
test_snappy_raw[alice29.txt-cramjam] 592.3768 (39.43) 966.6169 (16.99) 641.3070 (38.17) 49.4937 (16.28) 629.7166 (39.31) 37.7613 (58.61) 136;98 1,559.3156 (0.03) 1240 1
test_snappy_raw[alice29.txt-snappy] 601.7310 (40.05) 1,152.2900 (20.25) 652.7039 (38.85) 45.8487 (15.08) 641.4631 (40.04) 41.3470 (64.18) 229;100 1,532.0883 (0.03) 1558 1
test_snappy_raw[asyoulik.txt-cramjam] 525.0110 (34.94) 935.9650 (16.45) 562.4942 (33.48) 45.9160 (15.10) 549.8989 (34.32) 35.7465 (55.49) 169;116 1,777.7962 (0.03) 1752 1
test_snappy_raw[asyoulik.txt-snappy] 533.2129 (35.49) 801.3120 (14.09) 569.6283 (33.90) 36.6472 (12.05) 562.5240 (35.11) 26.6250 (41.33) 183;139 1,755.5307 (0.03) 1601 1
test_snappy_raw[fifty-four-mb-random-cramjam] 39,131.7350 (>1000.0) 45,471.8969 (799.28) 40,510.9683 (>1000.0) 1,457.7244 (479.48) 40,028.9460 (>1000.0) 2,177.5730 (>1000.0) 3;1 24.6847 (0.00) 26 1
test_snappy_raw[fifty-four-mb-random-snappy] 54,635.2209 (>1000.0) 72,615.1259 (>1000.0) 58,051.6920 (>1000.0) 4,926.5579 (>1000.0) 56,003.4390 (>1000.0) 3,226.2232 (>1000.0) 2;2 17.2260 (0.00) 19 1
test_snappy_raw[fifty-four-mb-repeating-cramjam] 17,660.2071 (>1000.0) 19,295.4130 (339.16) 18,270.7281 (>1000.0) 508.6797 (167.32) 18,170.9731 (>1000.0) 776.3330 (>1000.0) 10;0 54.7324 (0.00) 29 1
test_snappy_raw[fifty-four-mb-repeating-snappy] 31,336.1469 (>1000.0) 36,540.1430 (642.28) 33,227.9230 (>1000.0) 1,257.2886 (413.55) 33,241.5774 (>1000.0) 1,470.1985 (>1000.0) 7;2 30.0952 (0.00) 28 1
test_snappy_raw[fireworks.jpeg-cramjam] 30.2079 (2.01) 98.0159 (1.72) 33.8984 (2.02) 6.3499 (2.09) 31.3870 (1.96) 1.8030 (2.80) 1021;1401 29,499.8806 (0.50) 6830 1
test_snappy_raw[fireworks.jpeg-snappy] 15.0250 (1.0) 56.8910 (1.0) 16.8028 (1.0) 3.0402 (1.0) 16.0208 (1.0) 0.6442 (1.0) 2652;2892 59,514.0088 (1.0) 31101 1
test_snappy_raw[geo.protodata-cramjam] 154.2701 (10.27) 384.5550 (6.76) 170.6810 (10.16) 22.4152 (7.37) 160.4795 (10.02) 14.0951 (21.88) 404;385 5,858.8837 (0.10) 3338 1
test_snappy_raw[geo.protodata-snappy] 143.4439 (9.55) 338.1791 (5.94) 163.7641 (9.75) 22.5338 (7.41) 153.9169 (9.61) 18.9022 (29.34) 880;704 6,106.3450 (0.10) 5735 1
test_snappy_raw[html-cramjam] 163.8639 (10.91) 7,342.4571 (129.06) 195.1516 (11.61) 141.1143 (46.42) 175.2310 (10.94) 30.8362 (47.86) 23;259 5,124.2202 (0.09) 3916 1
test_snappy_raw[html-snappy] 156.4531 (10.41) 286.1598 (5.03) 172.0366 (10.24) 18.8565 (6.20) 163.5781 (10.21) 13.6626 (21.21) 701;545 5,812.7166 (0.10) 5380 1
test_snappy_raw[html_x_4-cramjam] 650.8362 (43.32) 1,473.4219 (25.90) 702.9023 (41.83) 90.3964 (29.73) 681.7840 (42.56) 51.0682 (79.27) 46;51 1,422.6727 (0.02) 729 1
test_snappy_raw[html_x_4-snappy] 632.6071 (42.10) 973.6570 (17.11) 673.5266 (40.08) 42.6296 (14.02) 661.2611 (41.28) 39.7667 (61.73) 128;64 1,484.7224 (0.02) 1089 1
test_snappy_raw[kppkn.gtb-cramjam] 494.2932 (32.90) 815.5340 (14.34) 529.8498 (31.53) 44.5572 (14.66) 516.8610 (32.26) 28.5008 (44.24) 123;107 1,887.3273 (0.03) 1285 1
test_snappy_raw[kppkn.gtb-snappy] 503.6981 (33.52) 1,076.1879 (18.92) 537.1296 (31.97) 39.4368 (12.97) 529.4709 (33.05) 38.3689 (59.56) 177;88 1,861.7480 (0.03) 1791 1
test_snappy_raw[lcet10.txt-cramjam] 1,558.4170 (103.72) 2,066.4569 (36.32) 1,658.8938 (98.73) 80.1077 (26.35) 1,641.2655 (102.45) 75.2069 (116.74) 104;33 602.8113 (0.01) 518 1
test_snappy_raw[lcet10.txt-snappy] 1,590.8261 (105.88) 2,150.9789 (37.81) 1,690.4985 (100.61) 79.6666 (26.20) 1,674.0509 (104.49) 67.9813 (105.52) 107;32 591.5415 (0.01) 511 1
test_snappy_raw[paper-100k.pdf-cramjam] 30.3620 (2.02) 78.7128 (1.38) 33.3714 (1.99) 5.1835 (1.70) 31.5530 (1.97) 1.8696 (2.90) 1041;1123 29,965.8108 (0.50) 7987 1
test_snappy_raw[paper-100k.pdf-snappy] 20.7000 (1.38) 131.4280 (2.31) 23.0795 (1.37) 4.1453 (1.36) 21.8560 (1.36) 0.6862 (1.07) 2480;2750 43,328.4893 (0.73) 24137 1
test_snappy_raw[plrabn12.txt-cramjam] 2,139.7951 (142.42) 4,277.7660 (75.19) 2,293.2170 (136.48) 191.9226 (63.13) 2,253.5495 (140.66) 105.8890 (164.36) 30;33 436.0686 (0.01) 346 1
test_snappy_raw[plrabn12.txt-snappy] 2,186.6821 (145.54) 2,964.7450 (52.11) 2,309.1957 (137.43) 103.5236 (34.05) 2,300.2855 (143.58) 122.5951 (190.29) 83;11 433.0512 (0.01) 364 1
test_snappy_raw[urls.10K-cramjam] 1,823.0381 (121.33) 2,950.2229 (51.86) 1,968.6821 (117.16) 159.5285 (52.47) 1,930.7047 (120.51) 106.4558 (165.24) 35;35 507.9540 (0.01) 454 1
test_snappy_raw[urls.10K-snappy] 1,815.5721 (120.84) 2,426.3519 (42.65) 1,930.2900 (114.88) 94.4775 (31.08) 1,910.3684 (119.24) 94.8999 (147.30) 90;24 518.0569 (0.01) 402 1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Snappy framed
------------------------------------------------------------------------------------------------------------------ benchmark: 28 tests -------------------------------------------------------------------------------------------------------------------
Name (time in us) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_framed[Mark.Twain-Tom.Sawyer.txt-cramjam] 71.6608 (1.0) 143.2849 (1.0) 77.3467 (1.0) 8.5645 (1.0) 74.5370 (1.0) 4.2152 (1.0) 443;673 12,928.7923 (1.0) 4581 1
test_snappy_framed[Mark.Twain-Tom.Sawyer.txt-snappy] 127.6799 (1.78) 242.7050 (1.69) 139.5094 (1.80) 14.9579 (1.75) 132.6599 (1.78) 8.6003 (2.04) 548;574 7,167.9778 (0.55) 4239 1
test_snappy_framed[alice29.txt-cramjam] 678.1621 (9.46) 1,185.1690 (8.27) 769.2867 (9.95) 97.6036 (11.40) 730.4728 (9.80) 66.7355 (15.83) 147;142 1,299.9056 (0.10) 1005 1
test_snappy_framed[alice29.txt-snappy] 1,362.9030 (19.02) 1,981.1080 (13.83) 1,477.0172 (19.10) 101.6660 (11.87) 1,446.0909 (19.40) 80.4238 (19.08) 92;51 677.0402 (0.05) 519 1
test_snappy_framed[asyoulik.txt-cramjam] 593.6739 (8.28) 1,012.5430 (7.07) 652.7332 (8.44) 64.0722 (7.48) 634.3831 (8.51) 49.8961 (11.84) 157;119 1,532.0195 (0.12) 1482 1
test_snappy_framed[asyoulik.txt-snappy] 1,155.4169 (16.12) 1,769.6789 (12.35) 1,270.6764 (16.43) 96.3240 (11.25) 1,239.2481 (16.63) 85.9072 (20.38) 99;48 786.9824 (0.06) 633 1
test_snappy_framed[fifty-four-mb-random-cramjam] 97,205.5991 (>1000.0) 101,710.5600 (709.85) 99,088.4910 (>1000.0) 1,483.4868 (173.21) 98,838.2000 (>1000.0) 2,404.6390 (570.47) 4;0 10.0920 (0.00) 10 1
test_snappy_framed[fifty-four-mb-random-snappy] 408,440.2199 (>1000.0) 424,345.1031 (>1000.0) 415,347.1177 (>1000.0) 6,626.5087 (773.71) 413,118.3329 (>1000.0) 10,849.8005 (>1000.0) 2;0 2.4076 (0.00) 5 1
test_snappy_framed[fifty-four-mb-repeating-cramjam] 65,814.4800 (918.42) 76,099.0751 (531.10) 69,209.2635 (894.79) 2,883.4202 (336.67) 68,925.1389 (924.71) 3,195.6249 (758.12) 4;1 14.4489 (0.00) 12 1
test_snappy_framed[fifty-four-mb-repeating-snappy] 336,508.3761 (>1000.0) 345,176.6630 (>1000.0) 340,673.9440 (>1000.0) 3,197.8086 (373.38) 340,007.7540 (>1000.0) 3,935.3999 (933.62) 2;0 2.9354 (0.00) 5 1
test_snappy_framed[fireworks.jpeg-cramjam] 97.2459 (1.36) 221.1900 (1.54) 106.3462 (1.37) 12.6363 (1.48) 101.7749 (1.37) 5.5988 (1.33) 833;1063 9,403.2489 (0.73) 6955 1
test_snappy_framed[fireworks.jpeg-snappy] 623.9191 (8.71) 934.7361 (6.52) 669.8641 (8.66) 48.3012 (5.64) 657.3130 (8.82) 42.4999 (10.08) 190;123 1,492.8401 (0.12) 1367 1
test_snappy_framed[geo.protodata-cramjam] 207.5629 (2.90) 404.3670 (2.82) 234.1518 (3.03) 32.5965 (3.81) 221.0444 (2.97) 24.9587 (5.92) 493;427 4,270.7341 (0.33) 3370 1
test_snappy_framed[geo.protodata-snappy] 725.7231 (10.13) 1,467.0279 (10.24) 803.0772 (10.38) 70.5311 (8.24) 778.2201 (10.44) 80.2568 (19.04) 247;39 1,245.2103 (0.10) 1069 1
test_snappy_framed[html-cramjam] 213.3292 (2.98) 442.1149 (3.09) 238.8942 (3.09) 30.8552 (3.60) 227.3140 (3.05) 24.7393 (5.87) 405;313 4,185.9537 (0.32) 3091 1
test_snappy_framed[html-snappy] 662.9000 (9.25) 1,024.9370 (7.15) 723.7951 (9.36) 53.6350 (6.26) 709.4492 (9.52) 48.3200 (11.46) 194;74 1,381.6064 (0.11) 1145 1
test_snappy_framed[html_x_4-cramjam] 818.3981 (11.42) 1,299.3729 (9.07) 877.5121 (11.35) 59.0900 (6.90) 864.0790 (11.59) 49.1450 (11.66) 100;62 1,139.5855 (0.09) 890 1
test_snappy_framed[html_x_4-snappy] 2,645.5622 (36.92) 3,548.0030 (24.76) 2,809.9779 (36.33) 122.5578 (14.31) 2,786.5430 (37.38) 132.4743 (31.43) 56;11 355.8747 (0.03) 260 1
test_snappy_framed[kppkn.gtb-cramjam] 585.1740 (8.17) 957.6699 (6.68) 639.6644 (8.27) 55.9915 (6.54) 624.6235 (8.38) 46.6530 (11.07) 156;101 1,563.3199 (0.12) 1394 1
test_snappy_framed[kppkn.gtb-snappy] 1,410.2019 (19.68) 2,017.8140 (14.08) 1,514.5751 (19.58) 82.1202 (9.59) 1,496.3031 (20.07) 80.2497 (19.04) 112;27 660.2512 (0.05) 575 1
test_snappy_framed[lcet10.txt-cramjam] 1,767.5080 (24.66) 2,496.6882 (17.42) 1,907.8104 (24.67) 122.0831 (14.25) 1,876.1901 (25.17) 112.6686 (26.73) 74;26 524.1611 (0.04) 373 1
test_snappy_framed[lcet10.txt-snappy] 3,762.0859 (52.50) 5,547.6481 (38.72) 4,205.9388 (54.38) 361.9042 (42.26) 4,077.6699 (54.71) 389.2800 (92.35) 34;11 237.7590 (0.02) 226 1
test_snappy_framed[paper-100k.pdf-cramjam] 90.9080 (1.27) 229.5701 (1.60) 101.0037 (1.31) 14.7422 (1.72) 94.9772 (1.27) 5.6181 (1.33) 882;1156 9,900.6244 (0.77) 7287 1
test_snappy_framed[paper-100k.pdf-snappy] 530.4210 (7.40) 832.6750 (5.81) 577.7526 (7.47) 42.9770 (5.02) 566.1936 (7.60) 43.2790 (10.27) 349;92 1,730.8447 (0.13) 1550 1
test_snappy_framed[plrabn12.txt-cramjam] 2,416.2000 (33.72) 3,320.1890 (23.17) 2,600.3834 (33.62) 115.0221 (13.43) 2,574.2790 (34.54) 141.9618 (33.68) 73;6 384.5587 (0.03) 329 1
test_snappy_framed[plrabn12.txt-snappy] 5,021.6871 (70.08) 6,882.1320 (48.03) 5,425.4279 (70.14) 269.9260 (31.52) 5,372.5401 (72.08) 274.9174 (65.22) 29;6 184.3173 (0.01) 175 1
test_snappy_framed[urls.10K-cramjam] 2,163.2011 (30.19) 3,356.4130 (23.42) 2,426.1733 (31.37) 195.9144 (22.88) 2,369.0879 (31.78) 234.0465 (55.52) 82;9 412.1717 (0.03) 339 1
test_snappy_framed[urls.10K-snappy] 5,931.8449 (82.78) 7,607.4291 (53.09) 6,361.8082 (82.25) 290.8592 (33.96) 6,288.9315 (84.37) 339.7919 (80.61) 38;5 157.1880 (0.01) 142 1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Actually, it (python-snappy) does quite a bit better in the framed format with the test_snappy_framed[fifty-four-mb-random-snappy] case than previously (and maybe some others; hard to tell which is microbenchmarking differences in some cases)
These are the benchmarks we are seeing with and without the patch using snappy's own benchmarking tools:
without patch:
-------------------------------------------------------------------------------------------------- benchmark: 4 tests -------------------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_framed[100MB_random-cramjam] 139.5540 (1.0) 149.4682 (1.0) 144.3223 (1.0) 3.8449 (1.45) 145.1978 (1.0) 6.5880 (1.84) 2;0 6.9289 (1.0) 7 1
test_snappy_framed[100MB_random-snappy] 1,037.5058 (7.43) 1,051.2743 (7.03) 1,045.1163 (7.24) 5.1236 (1.93) 1,045.3530 (7.20) 6.5106 (1.82) 2;0 0.9568 (0.14) 5 1
test_snappy_framed[100MB_zeroes-cramjam] 141.6173 (1.01) 149.6211 (1.00) 145.6500 (1.01) 2.6524 (1.0) 145.7977 (1.00) 3.5852 (1.0) 3;0 6.8658 (0.99) 8 1
test_snappy_framed[100MB_zeroes-snappy] 1,046.7271 (7.50) 1,057.7755 (7.08) 1,051.4539 (7.29) 4.4143 (1.66) 1,051.8373 (7.24) 6.6645 (1.86) 2;0 0.9511 (0.14) 5 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
With patch:
----------------------------------------------------------------------------------------------- benchmark: 4 tests ----------------------------------------------------------------------------------------------
Name (time in ms) Min Max Mean StdDev Median IQR Outliers OPS Rounds Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_framed[100MB_random-cramjam] 132.7121 (1.0) 145.2343 (1.0) 141.2591 (1.0) 4.3443 (1.04) 143.4440 (1.00) 4.4397 (1.0) 1;1 7.0792 (1.0) 7 1
test_snappy_framed[100MB_random-snappy] 149.3139 (1.13) 177.8449 (1.22) 164.5416 (1.16) 10.4677 (2.52) 166.1130 (1.16) 16.8984 (3.81) 2;0 6.0775 (0.86) 7 1
test_snappy_framed[100MB_zeroes-cramjam] 138.2131 (1.04) 150.1087 (1.03) 143.1957 (1.01) 4.1573 (1.0) 143.0023 (1.0) 5.9779 (1.35) 2;0 6.9834 (0.99) 7 1
test_snappy_framed[100MB_zeroes-snappy] 162.4972 (1.22) 216.4342 (1.49) 173.9976 (1.23) 18.9340 (4.55) 168.5015 (1.18) 5.4640 (1.23) 1;1 5.7472 (0.81) 7 1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
That is a significant difference.
We would really prefer not to have to maintain a separate patch indefinitely. But the performance increase is essential for our use.
Have you had a chance to examine the CRC error showing up in the tests?
crc errors? I am not sure what this is referring to?
See lines following https://github.com/andrix/python-snappy/runs/4255192362?check_suite_focus=true#step:7:312 (and also for one more of the test runs)
See lines following https://github.com/andrix/python-snappy/runs/4255192362?check_suite_focus=true#step:7:312 (and also for one more of the test runs)
Those crc errors were due to python2 tests being run. A separate PR was filed to make it work with python2: https://github.com/andrix/python-snappy/pull/111 although there I got feedback python2 is not supported. So I'm stuck in a catch22 here.
Since this has sat a long time, and #111 failed to solve the py2 issue, let's drop py2 right now, right here, so that we can get this improvement in.
Did you close this by mistake ? This PR was fine but it caused failures in testing related to your pyton2 test cases that were still run despite not being officially supported. I see you closed the fix for that issue, which is fine. But I don't understand why this one was closed?
In the latest version of this package (now in dev release on pypi), we no longer use the C library of snappy at all, but defer to cramjam's version. As @milesgranger commented above, it is plausible that something similar could help there, but you would certainly need to rework some things. I think - happy to be proved wrong if these changes are still useful.
I think these are the relevant changes in the Rust crate cramjam uses for snappy: https://github.com/BurntSushi/rust-snappy/commit/204215ca011cd9d9ed613a53a476ef0eb6baa4ea