nokolexbor icon indicating copy to clipboard operation
nokolexbor copied to clipboard

Fix the css selector

Open Krugloff opened this issue 1 year ago • 3 comments

The ~ selector is not working as expected.

I'm trying to extract only the blocks that appear before the .more-news element. This works in the browser but doesn't behave as expected in my code.

Environment

  • OS: MacOS Ventura 13.5
  • Ruby version: 2.7.8
  • Nokolexbor version: 0.5.4

Additional context

test_string = <<-STR
<div>
<div class="newscard position1"></div>
<div class="newscard position2"></div>
<div class="more-news"></div>
<div class="newscard position3"></div>
<div class="newscard position4"></div>
<div>
STR

require 'nokolexbor'

doc = Nokolexbor::HTML(test_string)
doc.css(".newscard:not(.more-news ~ .newscard)").count # => 4 (should be 2)

image image

Krugloff avatar Oct 23 '24 19:10 Krugloff

@Krugloff

I think we should just update the lexbor sources in nokolexbor.

In lexbor:

<div><div class="newscard position1"></div><div class="newscard position2"></div><div class="more-news"></div><div class="newscard position3"></div><div class="newscard position4"></div><div></div></div>

Selectors: .newscard:not(.more-news ~ .newscard)

1) <div class="newscard position1">
2) <div class="newscard position2">
Count: 2

lexborisov avatar Oct 23 '24 20:10 lexborisov

@lexborisov The reason I didn't update lexbor is that the new versions were not as performant as the current one.

I just did the benchmark again.

Lexbor at b2c0a61 Lexbor at 9677d13 New vs Old
parse (367 KB) 957.4 i/s 1028.7 i/s 1.07x faster
parse (1100 B) 64706.6 i/s 65730.1 i/s 1.01x faster
at_css 84346.4 i/s 49911.8 i/s 1.68x slower
css 10116.3 i/s 8010.6 i/s 1.26x slower
Raw data:

Lexbor at b2c0a61

Warming up --------------------------------------
Nokolexbor parse (367 KB)
                        96.000  i/100ms
Nokogiri parse (367 KB)
                        19.000  i/100ms
Calculating -------------------------------------
Nokolexbor parse (367 KB)
                        957.363  (± 0.5%) i/s -     19.200k in  20.055619s
Nokogiri parse (367 KB)
                        212.086  (±12.3%) i/s -      4.180k in  20.088497s

Comparison:
Nokolexbor parse (367 KB):      957.4 i/s
Nokogiri parse (367 KB):      212.1 i/s - 4.51x  (± 0.00) slower

Warming up --------------------------------------
Nokolexbor parse (1100 B)
                         6.479k i/100ms
Nokogiri parse (1100 B)
                         2.994k i/100ms
Calculating -------------------------------------
Nokolexbor parse (1100 B)
                         64.707k (± 1.4%) i/s -      1.296M in  20.029948s
Nokogiri parse (1100 B)
                         28.328k (± 3.8%) i/s -    565.866k in  20.004329s

Comparison:
Nokolexbor parse (1100 B):    64706.6 i/s
Nokogiri parse (1100 B):    28327.9 i/s - 2.28x  (± 0.00) slower

Warming up --------------------------------------
   Nokolexbor at_css     8.326k i/100ms
     Nokogiri at_css    13.000  i/100ms
Calculating -------------------------------------
   Nokolexbor at_css     84.346k (± 0.8%) i/s -      1.690M in  20.039873s
     Nokogiri at_css    139.870  (± 0.0%) i/s -      2.808k in  20.076093s

Comparison:
   Nokolexbor at_css:    84346.4 i/s
     Nokogiri at_css:      139.9 i/s - 603.04x  (± 0.00) slower

Warming up --------------------------------------
      Nokolexbor css     1.019k i/100ms
        Nokogiri css    14.000  i/100ms
Calculating -------------------------------------
      Nokolexbor css     10.116k (± 1.1%) i/s -    202.781k in  20.047377s
        Nokogiri css    139.903  (± 0.0%) i/s -      2.800k in  20.014070s

Comparison:
      Nokolexbor css:    10116.3 i/s
        Nokogiri css:      139.9 i/s - 72.31x  (± 0.00) slower

Lexbor at 9677d13

Warming up --------------------------------------
Nokolexbor parse (367 KB)
                       102.000  i/100ms
Nokogiri parse (367 KB)
                        19.000  i/100ms
Calculating -------------------------------------
Nokolexbor parse (367 KB)
                          1.029k (± 1.0%) i/s -     20.604k in  20.030938s
Nokogiri parse (367 KB)
                        211.331  (±12.3%) i/s -      4.161k in  20.066281s

Comparison:
Nokolexbor parse (367 KB):     1028.7 i/s
Nokogiri parse (367 KB):      211.3 i/s - 4.87x  (± 0.00) slower

Warming up --------------------------------------
Nokolexbor parse (1100 B)
                         6.654k i/100ms
Nokogiri parse (1100 B)
                         2.969k i/100ms
Calculating -------------------------------------
Nokolexbor parse (1100 B)
                         65.730k (± 0.7%) i/s -      1.317M in  20.044978s
Nokogiri parse (1100 B)
                         28.221k (± 3.8%) i/s -    564.110k in  20.018026s

Comparison:
Nokolexbor parse (1100 B):    65730.1 i/s
Nokogiri parse (1100 B):    28220.7 i/s - 2.33x  (± 0.00) slower

Warming up --------------------------------------
   Nokolexbor at_css     4.984k i/100ms
     Nokogiri at_css    13.000  i/100ms
Calculating -------------------------------------
   Nokolexbor at_css     49.912k (± 0.5%) i/s -      1.002M in  20.071530s
     Nokogiri at_css    132.617  (± 0.8%) i/s -      2.665k in  20.095940s

Comparison:
   Nokolexbor at_css:    49911.8 i/s
     Nokogiri at_css:      132.6 i/s - 376.36x  (± 0.00) slower

Warming up --------------------------------------
      Nokolexbor css   806.000  i/100ms
        Nokogiri css    13.000  i/100ms
Calculating -------------------------------------
      Nokolexbor css      8.011k (± 1.3%) i/s -    160.394k in  20.026210s
        Nokogiri css    132.339  (± 0.8%) i/s -      2.652k in  20.039860s

Comparison:
      Nokolexbor css:     8010.6 i/s
        Nokogiri css:      132.3 i/s - 60.53x  (± 0.00) slower

The newer version shows a small improvement in parsing but a big downgrade in selecting (at_css and css). Actually, the downgrade was introduced since this commit https://github.com/lexbor/lexbor/commit/9677d13321faa00e9ecf85ee2830d305dd21100f. I think CSS parsing is slowing down the whole selecting process.

Is there something I can do to recover the performance?

zyc9012 avatar Dec 17 '24 07:12 zyc9012

Hi @zyc9012

Okay, I'll take a look at it. It's weird that the parser slowed down, it didn't seem to change anything there.

lexborisov avatar Dec 17 '24 18:12 lexborisov