publicsuffix-ruby icon indicating copy to clipboard operation
publicsuffix-ruby copied to clipboard

Optimise select() for long subdomains

Open elliotwutingfeng opened this issue 2 years ago • 1 comments

Current implementation of select() searches for longest matching TLDs from the right end all the way to the left end.

This approach is necessary to handle edge cases like example.s3.cn-north-1.amazonaws.com.cn, where

  • s3.cn-north-1.amazonaws.com.cn and com.cn are valid.
  • but the intermediates cn-north-1.amazonaws.com.cn and amazonaws.com.cn are not valid.

However, this disadvantages URLs with long subdomains like a.very.long.subdomain.example.co.uk.

We can terminate the search early by limiting the search size to [parts.size, @max_rule_size].min, where parts.size is number of parts in the hostname, and @max_rule_size is the number of parts in the largest rule in @rules.

Also replaced the kernel loop with a faster bounded while loop, as it is possible to convert the current break condition to a loop condition.

Before

$ ruby test/benchmarks/bm_find_all.rb 1000000
Rehearsal -------------------------------------------------------------
NAME_SHORT                  2.348576   0.000000   2.348576 (  2.350146)
NAME_SHORT (noprivate)      2.444302   0.000000   2.444302 (  2.445995)
NAME_MEDIUM                 2.890648   0.000000   2.890648 (  2.892380)
NAME_MEDIUM (noprivate)     3.014823   0.000000   3.014823 (  3.017137)
NAME_LONG                   3.705042   0.002693   3.707735 (  3.710142)
NAME_LONG (noprivate)       3.727960   0.000000   3.727960 (  3.730321)
NAME_WILD                   3.657520   0.000000   3.657520 (  3.659759)
NAME_WILD (noprivate)       3.815247   0.000000   3.815247 (  3.817492)
NAME_EXCP                   4.420996   0.000000   4.420996 (  4.423570)
NAME_EXCP (noprivate)       4.408350   0.000000   4.408350 (  4.411540)
IAAA                        2.604410   0.000000   2.604410 (  2.605894)
IAAA (noprivate)            2.688674   0.000000   2.688674 (  2.690398)
IZZZ                        2.605931   0.000000   2.605931 (  2.607543)
IZZZ (noprivate)            2.679484   0.000000   2.679484 (  2.681334)
PAAA                        4.506107   0.000000   4.506107 (  4.509242)
PAAA (noprivate)            4.174697   0.000000   4.174697 (  4.177737)
PZZZ                        4.618712   0.000000   4.618712 (  4.622306)
PZZZ (noprivate)            4.323496   0.000000   4.323496 (  4.327372)
JP                          4.151477   0.000000   4.151477 (  4.154904)
JP (noprivate)              4.230317   0.000000   4.230317 (  4.234143)
IT                          2.645423   0.000000   2.645423 (  2.647490)
IT (noprivate)              2.731147   0.000000   2.731147 (  2.733281)
COM                         2.672895   0.000000   2.672895 (  2.675236)
COM (noprivate)             2.796167   0.000000   2.796167 (  2.798951)
--------------------------------------------------- total: 81.865094sec

                                user     system      total        real
NAME_SHORT                  2.455661   0.000000   2.455661 (  2.458051)
NAME_SHORT (noprivate)      2.465275   0.000000   2.465275 (  2.468431)
NAME_MEDIUM                 2.946424   0.000000   2.946424 (  2.949358)
NAME_MEDIUM (noprivate)     3.023296   0.000000   3.023296 (  3.025300)
NAME_LONG                   3.770850   0.000000   3.770850 (  3.773397)
NAME_LONG (noprivate)       3.828416   0.000000   3.828416 (  3.830904)
NAME_WILD                   3.749617   0.000000   3.749617 (  3.752038)
NAME_WILD (noprivate)       3.827687   0.000000   3.827687 (  3.830190)
NAME_EXCP                   4.418445   0.000000   4.418445 (  4.421315)
NAME_EXCP (noprivate)       4.531002   0.000000   4.531002 (  4.535273)
IAAA                        2.699374   0.000000   2.699374 (  2.700931)
IAAA (noprivate)            2.768779   0.000000   2.768779 (  2.771347)
IZZZ                        2.699160   0.000000   2.699160 (  2.702339)
IZZZ (noprivate)            2.766278   0.000000   2.766278 (  2.769706)
PAAA                        4.706753   0.000000   4.706753 (  4.711835)
PAAA (noprivate)            4.363877   0.000000   4.363877 (  4.367030)
PZZZ                        4.716710   0.000000   4.716710 (  4.722447)
PZZZ (noprivate)            4.109007   0.000000   4.109007 (  4.111433)
JP                          3.937950   0.000000   3.937950 (  3.941688)
JP (noprivate)              4.065472   0.000000   4.065472 (  4.070663)
IT                          2.628695   0.000000   2.628695 (  2.630612)
IT (noprivate)              2.718972   0.000000   2.718972 (  2.721554)
COM                         2.647181   0.000000   2.647181 (  2.649369)
COM (noprivate)             2.714115   0.000000   2.714115 (  2.715725)

After

$ ruby test/benchmarks/bm_find_all.rb 1000000
Rehearsal -------------------------------------------------------------
NAME_SHORT                  2.237599   0.000000   2.237599 (  2.239443)
NAME_SHORT (noprivate)      2.336548   0.000000   2.336548 (  2.338574)
NAME_MEDIUM                 2.713107   0.000000   2.713107 (  2.714795)
NAME_MEDIUM (noprivate)     2.830825   0.000000   2.830825 (  2.832685)
NAME_LONG                   3.042471   0.000000   3.042471 (  3.044456)
NAME_LONG (noprivate)       3.019529   0.003196   3.022725 (  3.024463)
NAME_WILD                   2.978485   0.000000   2.978485 (  2.980252)
NAME_WILD (noprivate)       3.088728   0.000000   3.088728 (  3.090743)
NAME_EXCP                   3.682105   0.000000   3.682105 (  3.684332)
NAME_EXCP (noprivate)       3.815742   0.000000   3.815742 (  3.818032)
IAAA                        2.458039   0.000000   2.458039 (  2.459425)
IAAA (noprivate)            2.496389   0.000000   2.496389 (  2.497893)
IZZZ                        2.404844   0.000000   2.404844 (  2.406255)
IZZZ (noprivate)            2.463744   0.000000   2.463744 (  2.465130)
PAAA                        3.515573   0.000000   3.515573 (  3.517585)
PAAA (noprivate)            3.193961   0.000000   3.193961 (  3.195845)
PZZZ                        3.587199   0.000000   3.587199 (  3.589388)
PZZZ (noprivate)            3.254129   0.000000   3.254129 (  3.256092)
JP                          3.783495   0.000000   3.783495 (  3.785693)
JP (noprivate)              3.885775   0.003331   3.889106 (  3.891664)
IT                          2.513112   0.000000   2.513112 (  2.514673)
IT (noprivate)              2.599210   0.000000   2.599210 (  2.600769)
COM                         2.539283   0.000000   2.539283 (  2.540692)
COM (noprivate)             2.485424   0.000000   2.485424 (  2.486922)
--------------------------------------------------- total: 70.931843sec

                                user     system      total        real
NAME_SHORT                  2.218905   0.000000   2.218905 (  2.220197)
NAME_SHORT (noprivate)      2.282971   0.000000   2.282971 (  2.284161)
NAME_MEDIUM                 2.707217   0.000000   2.707217 (  2.708815)
NAME_MEDIUM (noprivate)     2.781946   0.000000   2.781946 (  2.783615)
NAME_LONG                   3.018843   0.000000   3.018843 (  3.020559)
NAME_LONG (noprivate)       3.079345   0.000000   3.079345 (  3.081143)
NAME_WILD                   3.041727   0.000000   3.041727 (  3.043618)
NAME_WILD (noprivate)       3.079496   0.000000   3.079496 (  3.081228)
NAME_EXCP                   3.655873   0.000000   3.655873 (  3.658370)
NAME_EXCP (noprivate)       3.754648   0.000000   3.754648 (  3.756916)
IAAA                        2.507284   0.000000   2.507284 (  2.509283)
IAAA (noprivate)            2.540126   0.000000   2.540126 (  2.541872)
IZZZ                        2.466202   0.000000   2.466202 (  2.467584)
IZZZ (noprivate)            2.544616   0.000000   2.544616 (  2.546141)
PAAA                        3.622206   0.000000   3.622206 (  3.624447)
PAAA (noprivate)            3.272909   0.000000   3.272909 (  3.274831)
PZZZ                        3.675658   0.000000   3.675658 (  3.677843)
PZZZ (noprivate)            3.318359   0.000000   3.318359 (  3.320537)
JP                          3.882480   0.000000   3.882480 (  3.885434)
JP (noprivate)              3.971438   0.000000   3.971438 (  3.974437)
IT                          2.548282   0.000000   2.548282 (  2.549875)
IT (noprivate)              2.609304   0.000000   2.609304 (  2.610879)
COM                         2.569648   0.000000   2.569648 (  2.571186)
COM (noprivate)             2.497100   0.000000   2.497100 (  2.498543)

elliotwutingfeng avatar Oct 29 '23 06:10 elliotwutingfeng

Thanks for your contribution @elliotwutingfeng. I need some time to review the changes.

weppos avatar Nov 21 '23 10:11 weppos