regexp2 icon indicating copy to clipboard operation
regexp2 copied to clipboard

use `sync.Pool` for runner pooling

Open Gusted opened this issue 11 months ago • 3 comments

Currently as noted in the comment of putRunner, there's no attempt being made to limit the size of the runner pooling - this can result in the pool containing a lot of runners that were once created in a spur but will likely not be used anymore. Instead of trying to do gc within this code, move the pooling to sync.Pool which will deallocated objects in idle and therefore keep the size of the pool as small as possible.

The pool is on a per-regexp scope, this means certain properties can be re-used for optimal performance.

The motivation for this change is that I'm seeing a lot of memory (~300MiB) being hold by these runners until the Go program is restarted which feels like an unoptimal usage of memory, with this change after a spur of these runners have been created in a small amount of time they are gracefully deallocated over time and no longer hold memory indefinitely.

Gusted avatar Dec 24 '24 07:12 Gusted

Thanks for this @Gusted -- what's the impact on CPU benchmarks around this change?

dlclark avatar Dec 31 '24 20:12 dlclark

It is quite noticeable in specific benchmarks, mostly due to be Match not being pooled anymore, which cannot really be pooled now as its being returned to the application using regexp2 and the application would otherwise need take care of putting it back in the pool and that feels like a breaking change to require that for a memory efficient usage of regexp2 (In the current code it does feel like a easy way to create a data race if the application is using the match after it started a new run with the regexp, but this seems intentional?). This does feel like something that can be pooled if I moved the pool back to a per-regexp scope.

benchstat
                                           │    before    │                after-3                │
                                           │    sec/op    │    sec/op      vs base                │
Literal-12                                   351.1n ±  1%    589.7n ± 17%   +67.97% (p=0.002 n=6)
NotLiteral-12                                3.615µ ±  1%    3.913µ ±  1%    +8.26% (p=0.002 n=6)
MatchClass-12                                828.9n ±  1%   1039.5n ±  1%   +25.41% (p=0.002 n=6)
MatchClass_InRange-12                        811.4n ±  1%   1029.5n ±  1%   +26.88% (p=0.002 n=6)
AnchoredLiteralShortNonMatch-12              101.5n ±  1%    116.1n ±  1%   +14.33% (p=0.002 n=6)
AnchoredLiteralLongNonMatch-12               32.62n ±  2%    31.16n ±  1%    -4.48% (p=0.002 n=6)
AnchoredShortMatch-12                        214.5n ±  1%    498.9n ± 13%  +132.62% (p=0.002 n=6)
AnchoredLongMatch-12                         137.6n ±  1%    357.2n ±  3%  +159.69% (p=0.002 n=6)
OnePassShortA-12                             879.7n ±  1%   1545.5n ±  4%   +75.68% (p=0.002 n=6)
NotOnePassShortA-12                          890.3n ±  1%   1549.0n ±  5%   +73.99% (p=0.002 n=6)
OnePassShortB-12                             522.2n ±  1%    858.9n ±  7%   +64.48% (p=0.002 n=6)
NotOnePassShortB-12                          534.2n ±  1%    857.2n ±  4%   +60.45% (p=0.002 n=6)
OnePassLongPrefix-12                         227.0n ±  1%    464.4n ±  7%  +104.58% (p=0.002 n=6)
OnePassLongNotPrefix-12                      215.0n ±  1%    457.8n ±  3%  +112.93% (p=0.002 n=6)
MatchEasy0_32-12                             31.88n ±  1%    41.22n ±  0%   +29.31% (p=0.002 n=6)
MatchEasy0_1K-12                             145.7n ±  1%    165.8n ±  1%   +13.80% (p=0.002 n=6)
MatchEasy0_32K-12                            4.386µ ±  1%    4.411µ ±  1%    +0.57% (p=0.015 n=6)
MatchEasy0_1M-12                             138.0µ ±  1%    138.1µ ±  1%         ~ (p=0.589 n=6)
MatchEasy0_32M-12                            8.155m ±  0%    8.296m ±  1%    +1.73% (p=0.002 n=6)
MatchEasy1_32-12                             120.4n ±  1%    137.7n ±  1%   +14.28% (p=0.002 n=6)
MatchEasy1_1K-12                             4.130µ ±  0%    4.405µ ±  1%    +6.66% (p=0.002 n=6)
MatchEasy1_32K-12                            116.4µ ±  1%    117.9µ ±  1%    +1.33% (p=0.041 n=6)
MatchEasy1_1M-12                             3.786m ±  0%    3.774m ±  1%         ~ (p=0.065 n=6)
MatchEasy1_32M-12                            121.3m ±  0%    121.5m ±  1%         ~ (p=1.000 n=6)
MatchMedium_32-12                            238.5n ±  1%    239.3n ±  1%         ~ (p=0.461 n=6)
MatchMedium_1K-12                            11.95µ ±  1%    12.11µ ±  3%         ~ (p=0.093 n=6)
MatchMedium_32K-12                           403.6µ ±  1%    397.0µ ±  1%    -1.64% (p=0.004 n=6)
MatchMedium_1M-12                            12.96m ±  1%    12.87m ±  1%         ~ (p=0.132 n=6)
MatchMedium_32M-12                           415.6m ±  1%    408.3m ±  1%    -1.76% (p=0.004 n=6)
MatchHard_32-12                              15.40µ ±  1%    15.40µ ±  1%         ~ (p=0.732 n=6)
MatchHard_1K-12                              740.8µ ±  1%    734.3µ ±  1%    -0.89% (p=0.002 n=6)
MatchHard_32K-12                             25.76m ±  0%    25.23m ±  1%    -2.03% (p=0.002 n=6)
MatchHard_1M-12                              861.4m ±  0%    853.3m ±  1%    -0.93% (p=0.004 n=6)
MatchHard_32M-12                              27.70 ±  0%     27.25 ±  0%    -1.63% (p=0.002 n=6)
MatchHard1_32-12                             1.484µ ±  1%    1.700µ ±  2%   +14.56% (p=0.002 n=6)
MatchHard1_1K-12                             56.39µ ±  1%    55.99µ ±  1%    -0.71% (p=0.004 n=6)
MatchHard1_32K-12                            1.894m ±  1%    1.878m ±  1%         ~ (p=0.065 n=6)
MatchHard1_1M-12                             60.41m ±  2%    60.35m ±  1%         ~ (p=0.589 n=6)
MatchHard1_32M-12                             1.935 ±  1%     1.935 ±  1%         ~ (p=0.699 n=6)
Leading-12                                   12.88µ ±  0%    12.60µ ±  3%         ~ (p=0.065 n=6)
ShortSearch/serial-no-timeout-12             32.53n ±  0%    43.28n ±  1%   +33.04% (p=0.002 n=6)
ShortSearch/serial-fixed-timeout-12          33.58n ±  0%    44.32n ±  2%   +31.98% (p=0.002 n=6)
ShortSearch/serial-increasing-timeout-12     34.13n ± 29%    44.72n ± 14%   +31.03% (p=0.002 n=6)
ShortSearch/parallel-no-timeout-12           4.519n ±  2%    5.500n ± 11%   +21.70% (p=0.002 n=6)
ShortSearch/parallel-fixed-timeout-12        4.752n ±  1%    6.469n ± 15%   +36.12% (p=0.002 n=6)
ShortSearch/parallel-increasing-timeout-12   4.752n ±  2%    6.260n ± 29%   +31.75% (p=0.002 n=6)
ParserPrefixLongLen-12                       12.20m ±  1%    12.02m ±  2%    -1.40% (p=0.041 n=6)
geomean                                      13.69µ          16.49µ         +20.45%

                                           │    before     │               after-3                │
                                           │      B/s      │      B/s       vs base               │
MatchEasy0_32-12                             957.2Mi ±  1%   740.3Mi ±  0%  -22.65% (p=0.002 n=6)
MatchEasy0_1K-12                             6.545Gi ±  1%   5.751Gi ±  1%  -12.14% (p=0.002 n=6)
MatchEasy0_32K-12                            6.959Gi ±  1%   6.919Gi ±  1%   -0.57% (p=0.015 n=6)
MatchEasy0_1M-12                             7.077Gi ±  1%   7.072Gi ±  1%        ~ (p=0.589 n=6)
MatchEasy0_32M-12                            3.832Gi ±  0%   3.767Gi ±  1%   -1.70% (p=0.002 n=6)
MatchEasy1_32-12                             253.3Mi ±  1%   221.7Mi ±  1%  -12.46% (p=0.002 n=6)
MatchEasy1_1K-12                             236.4Mi ±  0%   221.7Mi ±  1%   -6.24% (p=0.002 n=6)
MatchEasy1_32K-12                            268.6Mi ±  1%   265.1Mi ±  1%   -1.30% (p=0.041 n=6)
MatchEasy1_1M-12                             264.1Mi ±  0%   265.0Mi ±  1%        ~ (p=0.065 n=6)
MatchEasy1_32M-12                            263.7Mi ±  0%   263.4Mi ±  1%        ~ (p=1.000 n=6)
MatchMedium_32-12                            128.0Mi ±  1%   127.5Mi ±  1%        ~ (p=0.461 n=6)
MatchMedium_1K-12                            81.73Mi ±  1%   80.66Mi ±  3%        ~ (p=0.093 n=6)
MatchMedium_32K-12                           77.42Mi ±  1%   78.71Mi ±  1%   +1.66% (p=0.004 n=6)
MatchMedium_1M-12                            77.17Mi ±  1%   77.70Mi ±  1%        ~ (p=0.132 n=6)
MatchMedium_32M-12                           76.99Mi ±  1%   78.38Mi ±  1%   +1.80% (p=0.004 n=6)
MatchHard_32-12                              1.984Mi ±  1%   1.984Mi ±  1%        ~ (p=0.924 n=6)
MatchHard_1K-12                              1.316Mi ±  1%   1.330Mi ±  1%   +1.09% (p=0.013 n=6)
MatchHard_32K-12                             1.211Mi ±  1%   1.240Mi ±  1%   +2.36% (p=0.002 n=6)
MatchHard_1M-12                              1.163Mi ±  1%   1.173Mi ±  1%   +0.82% (p=0.045 n=6)
MatchHard_32M-12                             1.154Mi ±  0%   1.173Mi ±  1%   +1.65% (p=0.002 n=6)
MatchHard1_32-12                             20.57Mi ±  1%   17.96Mi ±  2%  -12.70% (p=0.002 n=6)
MatchHard1_1K-12                             17.32Mi ±  1%   17.44Mi ±  1%   +0.72% (p=0.006 n=6)
MatchHard1_32K-12                            16.50Mi ±  1%   16.64Mi ±  1%        ~ (p=0.071 n=6)
MatchHard1_1M-12                             16.56Mi ±  2%   16.57Mi ±  1%        ~ (p=0.563 n=6)
MatchHard1_32M-12                            16.54Mi ±  1%   16.54Mi ±  1%        ~ (p=0.619 n=6)
ShortSearch/serial-no-timeout-12             2.863Gi ±  0%   2.152Gi ±  1%  -24.84% (p=0.002 n=6)
ShortSearch/serial-fixed-timeout-12          2.773Gi ±  0%   2.101Gi ±  1%  -24.23% (p=0.002 n=6)
ShortSearch/serial-increasing-timeout-12     2.729Gi ± 22%   2.083Gi ± 13%  -23.70% (p=0.002 n=6)
ShortSearch/parallel-no-timeout-12           20.61Gi ±  2%   16.94Gi ± 10%  -17.80% (p=0.002 n=6)
ShortSearch/parallel-fixed-timeout-12        19.60Gi ±  1%   14.40Gi ± 13%  -26.52% (p=0.002 n=6)
ShortSearch/parallel-increasing-timeout-12   19.60Gi ±  2%   14.89Gi ± 23%  -24.02% (p=0.002 n=6)
geomean                                      180.8Mi         168.1Mi         -7.05%

Gusted avatar Jan 02 '25 04:01 Gusted

The change is now much smaller and the performance is much better, overhead of atomic operations is a few ns which results in average 3.6% performance drop.

goos: linux
goarch: amd64
pkg: github.com/dlclark/regexp2
cpu: AMD Ryzen 5 3600X 6-Core Processor             
                                           │    before    │               after-4               │
                                           │    sec/op    │    sec/op     vs base               │
Literal-12                                   351.1n ±  1%   370.8n ±  1%   +5.63% (p=0.002 n=6)
NotLiteral-12                                3.615µ ±  1%   3.911µ ±  1%   +8.20% (p=0.002 n=6)
MatchClass-12                                828.9n ±  1%   848.6n ±  3%   +2.38% (p=0.002 n=6)
MatchClass_InRange-12                        811.4n ±  1%   828.7n ±  6%   +2.13% (p=0.009 n=6)
AnchoredLiteralShortNonMatch-12              101.5n ±  1%   117.0n ±  1%  +15.21% (p=0.002 n=6)
AnchoredLiteralLongNonMatch-12               32.62n ±  2%   32.84n ±  1%        ~ (p=0.485 n=6)
AnchoredShortMatch-12                        214.5n ±  1%   220.7n ±  1%   +2.89% (p=0.002 n=6)
AnchoredLongMatch-12                         137.6n ±  1%   137.6n ±  2%        ~ (p=0.976 n=6)
OnePassShortA-12                             879.7n ±  1%   901.8n ±  1%   +2.52% (p=0.002 n=6)
NotOnePassShortA-12                          890.3n ±  1%   909.4n ±  1%   +2.15% (p=0.002 n=6)
OnePassShortB-12                             522.2n ±  1%   544.1n ±  2%   +4.19% (p=0.002 n=6)
NotOnePassShortB-12                          534.2n ±  1%   552.1n ±  1%   +3.35% (p=0.002 n=6)
OnePassLongPrefix-12                         227.0n ±  1%   239.1n ±  1%   +5.33% (p=0.002 n=6)
OnePassLongNotPrefix-12                      215.0n ±  1%   226.8n ±  5%   +5.49% (p=0.002 n=6)
MatchEasy0_32-12                             31.88n ±  1%   32.75n ±  1%   +2.71% (p=0.002 n=6)
MatchEasy0_1K-12                             145.7n ±  1%   153.8n ±  1%   +5.53% (p=0.002 n=6)
MatchEasy0_32K-12                            4.386µ ±  1%   4.441µ ±  1%   +1.28% (p=0.002 n=6)
MatchEasy0_1M-12                             138.0µ ±  1%   137.7µ ±  1%        ~ (p=1.000 n=6)
MatchEasy0_32M-12                            8.155m ±  0%   8.447m ±  6%   +3.58% (p=0.002 n=6)
MatchEasy1_32-12                             120.4n ±  1%   129.2n ±  2%   +7.22% (p=0.002 n=6)
MatchEasy1_1K-12                             4.130µ ±  0%   4.198µ ±  2%   +1.63% (p=0.002 n=6)
MatchEasy1_32K-12                            116.4µ ±  1%   117.2µ ±  0%        ~ (p=0.065 n=6)
MatchEasy1_1M-12                             3.786m ±  0%   3.801m ±  0%   +0.40% (p=0.015 n=6)
MatchEasy1_32M-12                            121.3m ±  0%   120.7m ±  1%   -0.55% (p=0.002 n=6)
MatchMedium_32-12                            238.5n ±  1%   243.2n ±  1%   +1.97% (p=0.004 n=6)
MatchMedium_1K-12                            11.95µ ±  1%   11.99µ ±  4%        ~ (p=1.000 n=6)
MatchMedium_32K-12                           403.6µ ±  1%   406.5µ ± 16%        ~ (p=0.699 n=6)
MatchMedium_1M-12                            12.96m ±  1%   12.74m ±  5%        ~ (p=0.132 n=6)
MatchMedium_32M-12                           415.6m ±  1%   409.4m ±  2%   -1.49% (p=0.041 n=6)
MatchHard_32-12                              15.40µ ±  1%   15.66µ ±  1%   +1.71% (p=0.009 n=6)
MatchHard_1K-12                              740.8µ ±  1%   743.6µ ±  1%        ~ (p=0.180 n=6)
MatchHard_32K-12                             25.76m ±  0%   25.47m ±  2%        ~ (p=0.065 n=6)
MatchHard_1M-12                              861.4m ±  0%   846.8m ±  1%   -1.69% (p=0.002 n=6)
MatchHard_32M-12                              27.70 ±  0%    27.35 ±  1%   -1.26% (p=0.002 n=6)
MatchHard1_32-12                             1.484µ ±  1%   1.502µ ±  1%   +1.25% (p=0.009 n=6)
MatchHard1_1K-12                             56.39µ ±  1%   55.92µ ±  1%   -0.82% (p=0.026 n=6)
MatchHard1_32K-12                            1.894m ±  1%   1.884m ±  5%        ~ (p=0.394 n=6)
MatchHard1_1M-12                             60.41m ±  2%   60.52m ±  2%        ~ (p=0.699 n=6)
MatchHard1_32M-12                             1.935 ±  1%    1.925 ±  1%        ~ (p=0.310 n=6)
Leading-12                                   12.88µ ±  0%   12.46µ ±  1%   -3.22% (p=0.002 n=6)
ShortSearch/serial-no-timeout-12             32.53n ±  0%   36.20n ±  1%  +11.26% (p=0.002 n=6)
ShortSearch/serial-fixed-timeout-12          33.58n ±  0%   39.28n ±  1%  +16.97% (p=0.002 n=6)
ShortSearch/serial-increasing-timeout-12     34.13n ± 29%   39.69n ± 17%  +16.31% (p=0.041 n=6)
ShortSearch/parallel-no-timeout-12           4.519n ±  2%   5.380n ±  9%  +19.06% (p=0.002 n=6)
ShortSearch/parallel-fixed-timeout-12        4.752n ±  1%   5.692n ±  8%  +19.78% (p=0.002 n=6)
ShortSearch/parallel-increasing-timeout-12   4.752n ±  2%   5.646n ± 14%  +18.83% (p=0.002 n=6)
ParserPrefixLongLen-12                       12.20m ±  1%   11.81m ±  1%   -3.13% (p=0.002 n=6)
geomean                                      13.69µ         14.18µ         +3.57%

                                           │    before     │               after-4                │
                                           │      B/s      │      B/s       vs base               │
MatchEasy0_32-12                             957.2Mi ±  1%   931.9Mi ±  1%   -2.64% (p=0.002 n=6)
MatchEasy0_1K-12                             6.545Gi ±  1%   6.203Gi ±  1%   -5.23% (p=0.002 n=6)
MatchEasy0_32K-12                            6.959Gi ±  1%   6.871Gi ±  1%   -1.26% (p=0.002 n=6)
MatchEasy0_1M-12                             7.077Gi ±  1%   7.092Gi ±  1%        ~ (p=1.000 n=6)
MatchEasy0_32M-12                            3.832Gi ±  0%   3.700Gi ±  5%   -3.45% (p=0.002 n=6)
MatchEasy1_32-12                             253.3Mi ±  1%   236.3Mi ±  2%   -6.72% (p=0.002 n=6)
MatchEasy1_1K-12                             236.4Mi ±  0%   232.7Mi ±  2%   -1.60% (p=0.002 n=6)
MatchEasy1_32K-12                            268.6Mi ±  1%   266.6Mi ±  0%        ~ (p=0.065 n=6)
MatchEasy1_1M-12                             264.1Mi ±  0%   263.1Mi ±  0%   -0.39% (p=0.015 n=6)
MatchEasy1_32M-12                            263.7Mi ±  0%   265.2Mi ±  1%   +0.56% (p=0.002 n=6)
MatchMedium_32-12                            128.0Mi ±  1%   125.5Mi ±  1%   -1.95% (p=0.004 n=6)
MatchMedium_1K-12                            81.73Mi ±  1%   81.46Mi ±  4%        ~ (p=1.000 n=6)
MatchMedium_32K-12                           77.42Mi ±  1%   76.89Mi ± 14%        ~ (p=0.699 n=6)
MatchMedium_1M-12                            77.17Mi ±  1%   78.52Mi ±  5%        ~ (p=0.132 n=6)
MatchMedium_32M-12                           76.99Mi ±  1%   78.16Mi ±  2%   +1.52% (p=0.041 n=6)
MatchHard_32-12                              1.984Mi ±  1%   1.950Mi ±  1%   -1.68% (p=0.011 n=6)
MatchHard_1K-12                              1.316Mi ±  1%   1.316Mi ±  1%        ~ (p=0.758 n=6)
MatchHard_32K-12                             1.211Mi ±  1%   1.230Mi ±  2%   +1.57% (p=0.045 n=6)
MatchHard_1M-12                              1.163Mi ±  1%   1.178Mi ±  1%   +1.23% (p=0.002 n=6)
MatchHard_32M-12                             1.154Mi ±  0%   1.173Mi ±  1%   +1.65% (p=0.002 n=6)
MatchHard1_32-12                             20.57Mi ±  1%   20.32Mi ±  1%   -1.23% (p=0.009 n=6)
MatchHard1_1K-12                             17.32Mi ±  1%   17.46Mi ±  1%   +0.83% (p=0.022 n=6)
MatchHard1_32K-12                            16.50Mi ±  1%   16.59Mi ±  5%        ~ (p=0.331 n=6)
MatchHard1_1M-12                             16.56Mi ±  2%   16.52Mi ±  2%        ~ (p=0.667 n=6)
MatchHard1_32M-12                            16.54Mi ±  1%   16.62Mi ±  1%        ~ (p=0.242 n=6)
ShortSearch/serial-no-timeout-12             2.863Gi ±  0%   2.573Gi ±  1%  -10.13% (p=0.002 n=6)
ShortSearch/serial-fixed-timeout-12          2.773Gi ±  0%   2.371Gi ±  2%  -14.51% (p=0.002 n=6)
ShortSearch/serial-increasing-timeout-12     2.729Gi ± 22%   2.346Gi ± 15%  -14.04% (p=0.041 n=6)
ShortSearch/parallel-no-timeout-12           20.61Gi ±  2%   17.31Gi ±  9%  -16.02% (p=0.002 n=6)
ShortSearch/parallel-fixed-timeout-12        19.60Gi ±  1%   16.36Gi ±  8%  -16.50% (p=0.002 n=6)
ShortSearch/parallel-increasing-timeout-12   19.60Gi ±  2%   16.50Gi ± 12%  -15.83% (p=0.002 n=6)
geomean                                      180.8Mi         174.4Mi         -3.57%

Gusted avatar Jan 09 '25 06:01 Gusted