python-snappy icon indicating copy to clipboard operation
python-snappy copied to clipboard

Performance improvements

Open paulwouters opened this issue 4 years ago • 11 comments

  • stream compressors: use Intel SSE4.2 CRC32C instruction when crc32c is available
  • StreamDecompressor: faster decompression via a lot less memcpy

paulwouters avatar Nov 18 '21 18:11 paulwouters

I am not able to review the changes here - anyone else around? @paulwouters , it may be enough to see that all tests pass, but would appreciate posting some benchmarks.

Do you know is snappy in cramjam can benefit from similar changes? (cc @milesgranger )

martindurant avatar Nov 18 '21 18:11 martindurant

For us, the difference is that without these two patches, decompressing ~50mb takes 13sec, ~100mb takes 60sec (this is somewhat data-dependent).

With the patches applied over 0.6.0, decompressing that same 100mb takes 1sec

paulwouters avatar Nov 18 '21 18:11 paulwouters

Wow! Such a difference doesn't entirely seem plausible :| Is this perhaps data where no real compression happened, i.e., what was the compression ratio?

martindurant avatar Nov 18 '21 18:11 martindurant

Do you know is snappy in cramjam can benefit from similar changes? (cc @milesgranger )

I don't know, suppose the improvements would have to be in the upstream snappy crate.

However, I installed this branch from source and re-ran the benchmarks on my end w/ crc32c installed, and I get very similar results to existing benchmarks

Could be I didn't try hard enough, or the optimizations aren't triggered for my machine, but my system seems to align with what would be qualified for them supposedly. :man_shrugging:

Snappy raw:

-------------------------------------------------------------------------------------------------------------- benchmark: 28 tests ---------------------------------------------------------------------------------------------------------------
Name (time in us)                                              Min                    Max                   Mean                StdDev                 Median                   IQR            Outliers          OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-cramjam]         55.0391 (3.66)        133.2110 (2.34)         60.0523 (3.57)         8.4785 (2.79)         57.3170 (3.58)         3.0180 (4.68)      556;821  16,652.1528 (0.28)       5700           1
test_snappy_raw[Mark.Twain-Tom.Sawyer.txt-snappy]          52.7520 (3.51)        130.7940 (2.30)         57.7148 (3.43)         8.3499 (2.75)         55.0130 (3.43)         3.1963 (4.96)     930;1470  17,326.5634 (0.29)       9549           1
test_snappy_raw[alice29.txt-cramjam]                      592.3768 (39.43)       966.6169 (16.99)       641.3070 (38.17)       49.4937 (16.28)       629.7166 (39.31)       37.7613 (58.61)      136;98   1,559.3156 (0.03)       1240           1
test_snappy_raw[alice29.txt-snappy]                       601.7310 (40.05)     1,152.2900 (20.25)       652.7039 (38.85)       45.8487 (15.08)       641.4631 (40.04)       41.3470 (64.18)     229;100   1,532.0883 (0.03)       1558           1
test_snappy_raw[asyoulik.txt-cramjam]                     525.0110 (34.94)       935.9650 (16.45)       562.4942 (33.48)       45.9160 (15.10)       549.8989 (34.32)       35.7465 (55.49)     169;116   1,777.7962 (0.03)       1752           1
test_snappy_raw[asyoulik.txt-snappy]                      533.2129 (35.49)       801.3120 (14.09)       569.6283 (33.90)       36.6472 (12.05)       562.5240 (35.11)       26.6250 (41.33)     183;139   1,755.5307 (0.03)       1601           1
test_snappy_raw[fifty-four-mb-random-cramjam]          39,131.7350 (>1000.0)  45,471.8969 (799.28)   40,510.9683 (>1000.0)  1,457.7244 (479.48)   40,028.9460 (>1000.0)  2,177.5730 (>1000.0)       3;1      24.6847 (0.00)         26           1
test_snappy_raw[fifty-four-mb-random-snappy]           54,635.2209 (>1000.0)  72,615.1259 (>1000.0)  58,051.6920 (>1000.0)  4,926.5579 (>1000.0)  56,003.4390 (>1000.0)  3,226.2232 (>1000.0)       2;2      17.2260 (0.00)         19           1
test_snappy_raw[fifty-four-mb-repeating-cramjam]       17,660.2071 (>1000.0)  19,295.4130 (339.16)   18,270.7281 (>1000.0)    508.6797 (167.32)   18,170.9731 (>1000.0)    776.3330 (>1000.0)      10;0      54.7324 (0.00)         29           1
test_snappy_raw[fifty-four-mb-repeating-snappy]        31,336.1469 (>1000.0)  36,540.1430 (642.28)   33,227.9230 (>1000.0)  1,257.2886 (413.55)   33,241.5774 (>1000.0)  1,470.1985 (>1000.0)       7;2      30.0952 (0.00)         28           1
test_snappy_raw[fireworks.jpeg-cramjam]                    30.2079 (2.01)         98.0159 (1.72)         33.8984 (2.02)         6.3499 (2.09)         31.3870 (1.96)         1.8030 (2.80)    1021;1401  29,499.8806 (0.50)       6830           1
test_snappy_raw[fireworks.jpeg-snappy]                     15.0250 (1.0)          56.8910 (1.0)          16.8028 (1.0)          3.0402 (1.0)          16.0208 (1.0)          0.6442 (1.0)     2652;2892  59,514.0088 (1.0)       31101           1
test_snappy_raw[geo.protodata-cramjam]                    154.2701 (10.27)       384.5550 (6.76)        170.6810 (10.16)       22.4152 (7.37)        160.4795 (10.02)       14.0951 (21.88)     404;385   5,858.8837 (0.10)       3338           1
test_snappy_raw[geo.protodata-snappy]                     143.4439 (9.55)        338.1791 (5.94)        163.7641 (9.75)        22.5338 (7.41)        153.9169 (9.61)        18.9022 (29.34)     880;704   6,106.3450 (0.10)       5735           1
test_snappy_raw[html-cramjam]                             163.8639 (10.91)     7,342.4571 (129.06)      195.1516 (11.61)      141.1143 (46.42)       175.2310 (10.94)       30.8362 (47.86)      23;259   5,124.2202 (0.09)       3916           1
test_snappy_raw[html-snappy]                              156.4531 (10.41)       286.1598 (5.03)        172.0366 (10.24)       18.8565 (6.20)        163.5781 (10.21)       13.6626 (21.21)     701;545   5,812.7166 (0.10)       5380           1
test_snappy_raw[html_x_4-cramjam]                         650.8362 (43.32)     1,473.4219 (25.90)       702.9023 (41.83)       90.3964 (29.73)       681.7840 (42.56)       51.0682 (79.27)       46;51   1,422.6727 (0.02)        729           1
test_snappy_raw[html_x_4-snappy]                          632.6071 (42.10)       973.6570 (17.11)       673.5266 (40.08)       42.6296 (14.02)       661.2611 (41.28)       39.7667 (61.73)      128;64   1,484.7224 (0.02)       1089           1
test_snappy_raw[kppkn.gtb-cramjam]                        494.2932 (32.90)       815.5340 (14.34)       529.8498 (31.53)       44.5572 (14.66)       516.8610 (32.26)       28.5008 (44.24)     123;107   1,887.3273 (0.03)       1285           1
test_snappy_raw[kppkn.gtb-snappy]                         503.6981 (33.52)     1,076.1879 (18.92)       537.1296 (31.97)       39.4368 (12.97)       529.4709 (33.05)       38.3689 (59.56)      177;88   1,861.7480 (0.03)       1791           1
test_snappy_raw[lcet10.txt-cramjam]                     1,558.4170 (103.72)    2,066.4569 (36.32)     1,658.8938 (98.73)       80.1077 (26.35)     1,641.2655 (102.45)      75.2069 (116.74)     104;33     602.8113 (0.01)        518           1
test_snappy_raw[lcet10.txt-snappy]                      1,590.8261 (105.88)    2,150.9789 (37.81)     1,690.4985 (100.61)      79.6666 (26.20)     1,674.0509 (104.49)      67.9813 (105.52)     107;32     591.5415 (0.01)        511           1
test_snappy_raw[paper-100k.pdf-cramjam]                    30.3620 (2.02)         78.7128 (1.38)         33.3714 (1.99)         5.1835 (1.70)         31.5530 (1.97)         1.8696 (2.90)    1041;1123  29,965.8108 (0.50)       7987           1
test_snappy_raw[paper-100k.pdf-snappy]                     20.7000 (1.38)        131.4280 (2.31)         23.0795 (1.37)         4.1453 (1.36)         21.8560 (1.36)         0.6862 (1.07)    2480;2750  43,328.4893 (0.73)      24137           1
test_snappy_raw[plrabn12.txt-cramjam]                   2,139.7951 (142.42)    4,277.7660 (75.19)     2,293.2170 (136.48)     191.9226 (63.13)     2,253.5495 (140.66)     105.8890 (164.36)      30;33     436.0686 (0.01)        346           1
test_snappy_raw[plrabn12.txt-snappy]                    2,186.6821 (145.54)    2,964.7450 (52.11)     2,309.1957 (137.43)     103.5236 (34.05)     2,300.2855 (143.58)     122.5951 (190.29)      83;11     433.0512 (0.01)        364           1
test_snappy_raw[urls.10K-cramjam]                       1,823.0381 (121.33)    2,950.2229 (51.86)     1,968.6821 (117.16)     159.5285 (52.47)     1,930.7047 (120.51)     106.4558 (165.24)      35;35     507.9540 (0.01)        454           1
test_snappy_raw[urls.10K-snappy]                        1,815.5721 (120.84)    2,426.3519 (42.65)     1,930.2900 (114.88)      94.4775 (31.08)     1,910.3684 (119.24)      94.8999 (147.30)      90;24     518.0569 (0.01)        402           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Snappy framed

------------------------------------------------------------------------------------------------------------------ benchmark: 28 tests -------------------------------------------------------------------------------------------------------------------
Name (time in us)                                                  Min                     Max                    Mean                StdDev                  Median                    IQR            Outliers          OPS            Rounds  Iterations
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_framed[Mark.Twain-Tom.Sawyer.txt-cramjam]          71.6608 (1.0)          143.2849 (1.0)           77.3467 (1.0)          8.5645 (1.0)           74.5370 (1.0)           4.2152 (1.0)       443;673  12,928.7923 (1.0)        4581           1
test_snappy_framed[Mark.Twain-Tom.Sawyer.txt-snappy]          127.6799 (1.78)         242.7050 (1.69)         139.5094 (1.80)        14.9579 (1.75)         132.6599 (1.78)          8.6003 (2.04)      548;574   7,167.9778 (0.55)       4239           1
test_snappy_framed[alice29.txt-cramjam]                       678.1621 (9.46)       1,185.1690 (8.27)         769.2867 (9.95)        97.6036 (11.40)        730.4728 (9.80)         66.7355 (15.83)     147;142   1,299.9056 (0.10)       1005           1
test_snappy_framed[alice29.txt-snappy]                      1,362.9030 (19.02)      1,981.1080 (13.83)      1,477.0172 (19.10)      101.6660 (11.87)      1,446.0909 (19.40)        80.4238 (19.08)       92;51     677.0402 (0.05)        519           1
test_snappy_framed[asyoulik.txt-cramjam]                      593.6739 (8.28)       1,012.5430 (7.07)         652.7332 (8.44)        64.0722 (7.48)         634.3831 (8.51)         49.8961 (11.84)     157;119   1,532.0195 (0.12)       1482           1
test_snappy_framed[asyoulik.txt-snappy]                     1,155.4169 (16.12)      1,769.6789 (12.35)      1,270.6764 (16.43)       96.3240 (11.25)      1,239.2481 (16.63)        85.9072 (20.38)       99;48     786.9824 (0.06)        633           1
test_snappy_framed[fifty-four-mb-random-cramjam]           97,205.5991 (>1000.0)  101,710.5600 (709.85)    99,088.4910 (>1000.0)  1,483.4868 (173.21)    98,838.2000 (>1000.0)   2,404.6390 (570.47)        4;0      10.0920 (0.00)         10           1
test_snappy_framed[fifty-four-mb-random-snappy]           408,440.2199 (>1000.0)  424,345.1031 (>1000.0)  415,347.1177 (>1000.0)  6,626.5087 (773.71)   413,118.3329 (>1000.0)  10,849.8005 (>1000.0)       2;0       2.4076 (0.00)          5           1
test_snappy_framed[fifty-four-mb-repeating-cramjam]        65,814.4800 (918.42)    76,099.0751 (531.10)    69,209.2635 (894.79)   2,883.4202 (336.67)    68,925.1389 (924.71)    3,195.6249 (758.12)        4;1      14.4489 (0.00)         12           1
test_snappy_framed[fifty-four-mb-repeating-snappy]        336,508.3761 (>1000.0)  345,176.6630 (>1000.0)  340,673.9440 (>1000.0)  3,197.8086 (373.38)   340,007.7540 (>1000.0)   3,935.3999 (933.62)        2;0       2.9354 (0.00)          5           1
test_snappy_framed[fireworks.jpeg-cramjam]                     97.2459 (1.36)         221.1900 (1.54)         106.3462 (1.37)        12.6363 (1.48)         101.7749 (1.37)          5.5988 (1.33)     833;1063   9,403.2489 (0.73)       6955           1
test_snappy_framed[fireworks.jpeg-snappy]                     623.9191 (8.71)         934.7361 (6.52)         669.8641 (8.66)        48.3012 (5.64)         657.3130 (8.82)         42.4999 (10.08)     190;123   1,492.8401 (0.12)       1367           1
test_snappy_framed[geo.protodata-cramjam]                     207.5629 (2.90)         404.3670 (2.82)         234.1518 (3.03)        32.5965 (3.81)         221.0444 (2.97)         24.9587 (5.92)      493;427   4,270.7341 (0.33)       3370           1
test_snappy_framed[geo.protodata-snappy]                      725.7231 (10.13)      1,467.0279 (10.24)        803.0772 (10.38)       70.5311 (8.24)         778.2201 (10.44)        80.2568 (19.04)      247;39   1,245.2103 (0.10)       1069           1
test_snappy_framed[html-cramjam]                              213.3292 (2.98)         442.1149 (3.09)         238.8942 (3.09)        30.8552 (3.60)         227.3140 (3.05)         24.7393 (5.87)      405;313   4,185.9537 (0.32)       3091           1
test_snappy_framed[html-snappy]                               662.9000 (9.25)       1,024.9370 (7.15)         723.7951 (9.36)        53.6350 (6.26)         709.4492 (9.52)         48.3200 (11.46)      194;74   1,381.6064 (0.11)       1145           1
test_snappy_framed[html_x_4-cramjam]                          818.3981 (11.42)      1,299.3729 (9.07)         877.5121 (11.35)       59.0900 (6.90)         864.0790 (11.59)        49.1450 (11.66)      100;62   1,139.5855 (0.09)        890           1
test_snappy_framed[html_x_4-snappy]                         2,645.5622 (36.92)      3,548.0030 (24.76)      2,809.9779 (36.33)      122.5578 (14.31)      2,786.5430 (37.38)       132.4743 (31.43)       56;11     355.8747 (0.03)        260           1
test_snappy_framed[kppkn.gtb-cramjam]                         585.1740 (8.17)         957.6699 (6.68)         639.6644 (8.27)        55.9915 (6.54)         624.6235 (8.38)         46.6530 (11.07)     156;101   1,563.3199 (0.12)       1394           1
test_snappy_framed[kppkn.gtb-snappy]                        1,410.2019 (19.68)      2,017.8140 (14.08)      1,514.5751 (19.58)       82.1202 (9.59)       1,496.3031 (20.07)        80.2497 (19.04)      112;27     660.2512 (0.05)        575           1
test_snappy_framed[lcet10.txt-cramjam]                      1,767.5080 (24.66)      2,496.6882 (17.42)      1,907.8104 (24.67)      122.0831 (14.25)      1,876.1901 (25.17)       112.6686 (26.73)       74;26     524.1611 (0.04)        373           1
test_snappy_framed[lcet10.txt-snappy]                       3,762.0859 (52.50)      5,547.6481 (38.72)      4,205.9388 (54.38)      361.9042 (42.26)      4,077.6699 (54.71)       389.2800 (92.35)       34;11     237.7590 (0.02)        226           1
test_snappy_framed[paper-100k.pdf-cramjam]                     90.9080 (1.27)         229.5701 (1.60)         101.0037 (1.31)        14.7422 (1.72)          94.9772 (1.27)          5.6181 (1.33)     882;1156   9,900.6244 (0.77)       7287           1
test_snappy_framed[paper-100k.pdf-snappy]                     530.4210 (7.40)         832.6750 (5.81)         577.7526 (7.47)        42.9770 (5.02)         566.1936 (7.60)         43.2790 (10.27)      349;92   1,730.8447 (0.13)       1550           1
test_snappy_framed[plrabn12.txt-cramjam]                    2,416.2000 (33.72)      3,320.1890 (23.17)      2,600.3834 (33.62)      115.0221 (13.43)      2,574.2790 (34.54)       141.9618 (33.68)        73;6     384.5587 (0.03)        329           1
test_snappy_framed[plrabn12.txt-snappy]                     5,021.6871 (70.08)      6,882.1320 (48.03)      5,425.4279 (70.14)      269.9260 (31.52)      5,372.5401 (72.08)       274.9174 (65.22)        29;6     184.3173 (0.01)        175           1
test_snappy_framed[urls.10K-cramjam]                        2,163.2011 (30.19)      3,356.4130 (23.42)      2,426.1733 (31.37)      195.9144 (22.88)      2,369.0879 (31.78)       234.0465 (55.52)        82;9     412.1717 (0.03)        339           1
test_snappy_framed[urls.10K-snappy]                         5,931.8449 (82.78)      7,607.4291 (53.09)      6,361.8082 (82.25)      290.8592 (33.96)      6,288.9315 (84.37)       339.7919 (80.61)        38;5     157.1880 (0.01)        142           1
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

milesgranger avatar Nov 18 '21 18:11 milesgranger

Actually, it (python-snappy) does quite a bit better in the framed format with the test_snappy_framed[fifty-four-mb-random-snappy] case than previously (and maybe some others; hard to tell which is microbenchmarking differences in some cases)

milesgranger avatar Nov 18 '21 18:11 milesgranger

These are the benchmarks we are seeing with and without the patch using snappy's own benchmarking tools:

without patch:

-------------------------------------------------------------------------------------------------- benchmark: 4 tests -------------------------------------------------------------------------------------------------
Name (time in ms)                                   Min                   Max                  Mean            StdDev                Median               IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_framed[100MB_random-cramjam]       139.5540 (1.0)        149.4682 (1.0)        144.3223 (1.0)      3.8449 (1.45)       145.1978 (1.0)      6.5880 (1.84)          2;0  6.9289 (1.0)           7           1
test_snappy_framed[100MB_random-snappy]      1,037.5058 (7.43)     1,051.2743 (7.03)     1,045.1163 (7.24)     5.1236 (1.93)     1,045.3530 (7.20)     6.5106 (1.82)          2;0  0.9568 (0.14)          5           1
test_snappy_framed[100MB_zeroes-cramjam]       141.6173 (1.01)       149.6211 (1.00)       145.6500 (1.01)     2.6524 (1.0)        145.7977 (1.00)     3.5852 (1.0)           3;0  6.8658 (0.99)          8           1
test_snappy_framed[100MB_zeroes-snappy]      1,046.7271 (7.50)     1,057.7755 (7.08)     1,051.4539 (7.29)     4.4143 (1.66)     1,051.8373 (7.24)     6.6645 (1.86)          2;0  0.9511 (0.14)          5           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

With patch:


----------------------------------------------------------------------------------------------- benchmark: 4 tests ----------------------------------------------------------------------------------------------
Name (time in ms)                                 Min                 Max                Mean             StdDev              Median                IQR            Outliers     OPS            Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_snappy_framed[100MB_random-cramjam]     132.7121 (1.0)      145.2343 (1.0)      141.2591 (1.0)       4.3443 (1.04)     143.4440 (1.00)      4.4397 (1.0)           1;1  7.0792 (1.0)           7           1
test_snappy_framed[100MB_random-snappy]      149.3139 (1.13)     177.8449 (1.22)     164.5416 (1.16)     10.4677 (2.52)     166.1130 (1.16)     16.8984 (3.81)          2;0  6.0775 (0.86)          7           1
test_snappy_framed[100MB_zeroes-cramjam]     138.2131 (1.04)     150.1087 (1.03)     143.1957 (1.01)      4.1573 (1.0)      143.0023 (1.0)       5.9779 (1.35)          2;0  6.9834 (0.99)          7           1
test_snappy_framed[100MB_zeroes-snappy]      162.4972 (1.22)     216.4342 (1.49)     173.9976 (1.23)     18.9340 (4.55)     168.5015 (1.18)      5.4640 (1.23)          1;1  5.7472 (0.81)          7           1
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

That is a significant difference.

We would really prefer not to have to maintain a separate patch indefinitely. But the performance increase is essential for our use.

paulwouters avatar Nov 26 '21 20:11 paulwouters

Have you had a chance to examine the CRC error showing up in the tests?

martindurant avatar Nov 26 '21 20:11 martindurant

crc errors? I am not sure what this is referring to?

paulwouters avatar Nov 29 '21 01:11 paulwouters

See lines following https://github.com/andrix/python-snappy/runs/4255192362?check_suite_focus=true#step:7:312 (and also for one more of the test runs)

martindurant avatar Nov 29 '21 14:11 martindurant

See lines following https://github.com/andrix/python-snappy/runs/4255192362?check_suite_focus=true#step:7:312 (and also for one more of the test runs)

Those crc errors were due to python2 tests being run. A separate PR was filed to make it work with python2: https://github.com/andrix/python-snappy/pull/111 although there I got feedback python2 is not supported. So I'm stuck in a catch22 here.

paulwouters avatar Dec 06 '21 16:12 paulwouters

Since this has sat a long time, and #111 failed to solve the py2 issue, let's drop py2 right now, right here, so that we can get this improvement in.

martindurant avatar Feb 18 '22 17:02 martindurant

Did you close this by mistake ? This PR was fine but it caused failures in testing related to your pyton2 test cases that were still run despite not being officially supported. I see you closed the fix for that issue, which is fine. But I don't understand why this one was closed?

paulwouters avatar Feb 23 '24 21:02 paulwouters

In the latest version of this package (now in dev release on pypi), we no longer use the C library of snappy at all, but defer to cramjam's version. As @milesgranger commented above, it is plausible that something similar could help there, but you would certainly need to rework some things. I think - happy to be proved wrong if these changes are still useful.

martindurant avatar Feb 23 '24 21:02 martindurant

I think these are the relevant changes in the Rust crate cramjam uses for snappy: https://github.com/BurntSushi/rust-snappy/commit/204215ca011cd9d9ed613a53a476ef0eb6baa4ea

milesgranger avatar Feb 24 '24 06:02 milesgranger