publicsuffix-ruby
publicsuffix-ruby copied to clipboard
Optimise select() for long subdomains
Current implementation of select() searches for longest matching TLDs from the right end all the way to the left end.
This approach is necessary to handle edge cases like example.s3.cn-north-1.amazonaws.com.cn, where
- s3.cn-north-1.amazonaws.com.cn and com.cn are valid.
- but the intermediates cn-north-1.amazonaws.com.cn and amazonaws.com.cn are not valid.
However, this disadvantages URLs with long subdomains like a.very.long.subdomain.example.co.uk.
We can terminate the search early by limiting the search size to [parts.size, @max_rule_size].min, where parts.size is number of parts in the hostname, and @max_rule_size is the number of parts in the largest rule in @rules.
Also replaced the kernel loop with a faster bounded while loop, as it is possible to convert the current break condition to a loop condition.
Before
$ ruby test/benchmarks/bm_find_all.rb 1000000
Rehearsal -------------------------------------------------------------
NAME_SHORT 2.348576 0.000000 2.348576 ( 2.350146)
NAME_SHORT (noprivate) 2.444302 0.000000 2.444302 ( 2.445995)
NAME_MEDIUM 2.890648 0.000000 2.890648 ( 2.892380)
NAME_MEDIUM (noprivate) 3.014823 0.000000 3.014823 ( 3.017137)
NAME_LONG 3.705042 0.002693 3.707735 ( 3.710142)
NAME_LONG (noprivate) 3.727960 0.000000 3.727960 ( 3.730321)
NAME_WILD 3.657520 0.000000 3.657520 ( 3.659759)
NAME_WILD (noprivate) 3.815247 0.000000 3.815247 ( 3.817492)
NAME_EXCP 4.420996 0.000000 4.420996 ( 4.423570)
NAME_EXCP (noprivate) 4.408350 0.000000 4.408350 ( 4.411540)
IAAA 2.604410 0.000000 2.604410 ( 2.605894)
IAAA (noprivate) 2.688674 0.000000 2.688674 ( 2.690398)
IZZZ 2.605931 0.000000 2.605931 ( 2.607543)
IZZZ (noprivate) 2.679484 0.000000 2.679484 ( 2.681334)
PAAA 4.506107 0.000000 4.506107 ( 4.509242)
PAAA (noprivate) 4.174697 0.000000 4.174697 ( 4.177737)
PZZZ 4.618712 0.000000 4.618712 ( 4.622306)
PZZZ (noprivate) 4.323496 0.000000 4.323496 ( 4.327372)
JP 4.151477 0.000000 4.151477 ( 4.154904)
JP (noprivate) 4.230317 0.000000 4.230317 ( 4.234143)
IT 2.645423 0.000000 2.645423 ( 2.647490)
IT (noprivate) 2.731147 0.000000 2.731147 ( 2.733281)
COM 2.672895 0.000000 2.672895 ( 2.675236)
COM (noprivate) 2.796167 0.000000 2.796167 ( 2.798951)
--------------------------------------------------- total: 81.865094sec
user system total real
NAME_SHORT 2.455661 0.000000 2.455661 ( 2.458051)
NAME_SHORT (noprivate) 2.465275 0.000000 2.465275 ( 2.468431)
NAME_MEDIUM 2.946424 0.000000 2.946424 ( 2.949358)
NAME_MEDIUM (noprivate) 3.023296 0.000000 3.023296 ( 3.025300)
NAME_LONG 3.770850 0.000000 3.770850 ( 3.773397)
NAME_LONG (noprivate) 3.828416 0.000000 3.828416 ( 3.830904)
NAME_WILD 3.749617 0.000000 3.749617 ( 3.752038)
NAME_WILD (noprivate) 3.827687 0.000000 3.827687 ( 3.830190)
NAME_EXCP 4.418445 0.000000 4.418445 ( 4.421315)
NAME_EXCP (noprivate) 4.531002 0.000000 4.531002 ( 4.535273)
IAAA 2.699374 0.000000 2.699374 ( 2.700931)
IAAA (noprivate) 2.768779 0.000000 2.768779 ( 2.771347)
IZZZ 2.699160 0.000000 2.699160 ( 2.702339)
IZZZ (noprivate) 2.766278 0.000000 2.766278 ( 2.769706)
PAAA 4.706753 0.000000 4.706753 ( 4.711835)
PAAA (noprivate) 4.363877 0.000000 4.363877 ( 4.367030)
PZZZ 4.716710 0.000000 4.716710 ( 4.722447)
PZZZ (noprivate) 4.109007 0.000000 4.109007 ( 4.111433)
JP 3.937950 0.000000 3.937950 ( 3.941688)
JP (noprivate) 4.065472 0.000000 4.065472 ( 4.070663)
IT 2.628695 0.000000 2.628695 ( 2.630612)
IT (noprivate) 2.718972 0.000000 2.718972 ( 2.721554)
COM 2.647181 0.000000 2.647181 ( 2.649369)
COM (noprivate) 2.714115 0.000000 2.714115 ( 2.715725)
After
$ ruby test/benchmarks/bm_find_all.rb 1000000
Rehearsal -------------------------------------------------------------
NAME_SHORT 2.237599 0.000000 2.237599 ( 2.239443)
NAME_SHORT (noprivate) 2.336548 0.000000 2.336548 ( 2.338574)
NAME_MEDIUM 2.713107 0.000000 2.713107 ( 2.714795)
NAME_MEDIUM (noprivate) 2.830825 0.000000 2.830825 ( 2.832685)
NAME_LONG 3.042471 0.000000 3.042471 ( 3.044456)
NAME_LONG (noprivate) 3.019529 0.003196 3.022725 ( 3.024463)
NAME_WILD 2.978485 0.000000 2.978485 ( 2.980252)
NAME_WILD (noprivate) 3.088728 0.000000 3.088728 ( 3.090743)
NAME_EXCP 3.682105 0.000000 3.682105 ( 3.684332)
NAME_EXCP (noprivate) 3.815742 0.000000 3.815742 ( 3.818032)
IAAA 2.458039 0.000000 2.458039 ( 2.459425)
IAAA (noprivate) 2.496389 0.000000 2.496389 ( 2.497893)
IZZZ 2.404844 0.000000 2.404844 ( 2.406255)
IZZZ (noprivate) 2.463744 0.000000 2.463744 ( 2.465130)
PAAA 3.515573 0.000000 3.515573 ( 3.517585)
PAAA (noprivate) 3.193961 0.000000 3.193961 ( 3.195845)
PZZZ 3.587199 0.000000 3.587199 ( 3.589388)
PZZZ (noprivate) 3.254129 0.000000 3.254129 ( 3.256092)
JP 3.783495 0.000000 3.783495 ( 3.785693)
JP (noprivate) 3.885775 0.003331 3.889106 ( 3.891664)
IT 2.513112 0.000000 2.513112 ( 2.514673)
IT (noprivate) 2.599210 0.000000 2.599210 ( 2.600769)
COM 2.539283 0.000000 2.539283 ( 2.540692)
COM (noprivate) 2.485424 0.000000 2.485424 ( 2.486922)
--------------------------------------------------- total: 70.931843sec
user system total real
NAME_SHORT 2.218905 0.000000 2.218905 ( 2.220197)
NAME_SHORT (noprivate) 2.282971 0.000000 2.282971 ( 2.284161)
NAME_MEDIUM 2.707217 0.000000 2.707217 ( 2.708815)
NAME_MEDIUM (noprivate) 2.781946 0.000000 2.781946 ( 2.783615)
NAME_LONG 3.018843 0.000000 3.018843 ( 3.020559)
NAME_LONG (noprivate) 3.079345 0.000000 3.079345 ( 3.081143)
NAME_WILD 3.041727 0.000000 3.041727 ( 3.043618)
NAME_WILD (noprivate) 3.079496 0.000000 3.079496 ( 3.081228)
NAME_EXCP 3.655873 0.000000 3.655873 ( 3.658370)
NAME_EXCP (noprivate) 3.754648 0.000000 3.754648 ( 3.756916)
IAAA 2.507284 0.000000 2.507284 ( 2.509283)
IAAA (noprivate) 2.540126 0.000000 2.540126 ( 2.541872)
IZZZ 2.466202 0.000000 2.466202 ( 2.467584)
IZZZ (noprivate) 2.544616 0.000000 2.544616 ( 2.546141)
PAAA 3.622206 0.000000 3.622206 ( 3.624447)
PAAA (noprivate) 3.272909 0.000000 3.272909 ( 3.274831)
PZZZ 3.675658 0.000000 3.675658 ( 3.677843)
PZZZ (noprivate) 3.318359 0.000000 3.318359 ( 3.320537)
JP 3.882480 0.000000 3.882480 ( 3.885434)
JP (noprivate) 3.971438 0.000000 3.971438 ( 3.974437)
IT 2.548282 0.000000 2.548282 ( 2.549875)
IT (noprivate) 2.609304 0.000000 2.609304 ( 2.610879)
COM 2.569648 0.000000 2.569648 ( 2.571186)
COM (noprivate) 2.497100 0.000000 2.497100 ( 2.498543)
Thanks for your contribution @elliotwutingfeng. I need some time to review the changes.