Fix the css selector
The ~ selector is not working as expected.
I'm trying to extract only the blocks that appear before the .more-news element. This works in the browser but doesn't behave as expected in my code.
Environment
- OS: MacOS Ventura 13.5
- Ruby version: 2.7.8
- Nokolexbor version: 0.5.4
Additional context
test_string = <<-STR
<div>
<div class="newscard position1"></div>
<div class="newscard position2"></div>
<div class="more-news"></div>
<div class="newscard position3"></div>
<div class="newscard position4"></div>
<div>
STR
require 'nokolexbor'
doc = Nokolexbor::HTML(test_string)
doc.css(".newscard:not(.more-news ~ .newscard)").count # => 4 (should be 2)
@Krugloff
I think we should just update the lexbor sources in nokolexbor.
In lexbor:
<div><div class="newscard position1"></div><div class="newscard position2"></div><div class="more-news"></div><div class="newscard position3"></div><div class="newscard position4"></div><div></div></div>
Selectors: .newscard:not(.more-news ~ .newscard)
1) <div class="newscard position1">
2) <div class="newscard position2">
Count: 2
@lexborisov The reason I didn't update lexbor is that the new versions were not as performant as the current one.
I just did the benchmark again.
| Lexbor at b2c0a61 | Lexbor at 9677d13 | New vs Old | |
|---|---|---|---|
| parse (367 KB) | 957.4 i/s | 1028.7 i/s | 1.07x faster |
| parse (1100 B) | 64706.6 i/s | 65730.1 i/s | 1.01x faster |
| at_css | 84346.4 i/s | 49911.8 i/s | 1.68x slower |
| css | 10116.3 i/s | 8010.6 i/s | 1.26x slower |
Raw data:
Lexbor at b2c0a61
Warming up --------------------------------------
Nokolexbor parse (367 KB)
96.000 i/100ms
Nokogiri parse (367 KB)
19.000 i/100ms
Calculating -------------------------------------
Nokolexbor parse (367 KB)
957.363 (± 0.5%) i/s - 19.200k in 20.055619s
Nokogiri parse (367 KB)
212.086 (±12.3%) i/s - 4.180k in 20.088497s
Comparison:
Nokolexbor parse (367 KB): 957.4 i/s
Nokogiri parse (367 KB): 212.1 i/s - 4.51x (± 0.00) slower
Warming up --------------------------------------
Nokolexbor parse (1100 B)
6.479k i/100ms
Nokogiri parse (1100 B)
2.994k i/100ms
Calculating -------------------------------------
Nokolexbor parse (1100 B)
64.707k (± 1.4%) i/s - 1.296M in 20.029948s
Nokogiri parse (1100 B)
28.328k (± 3.8%) i/s - 565.866k in 20.004329s
Comparison:
Nokolexbor parse (1100 B): 64706.6 i/s
Nokogiri parse (1100 B): 28327.9 i/s - 2.28x (± 0.00) slower
Warming up --------------------------------------
Nokolexbor at_css 8.326k i/100ms
Nokogiri at_css 13.000 i/100ms
Calculating -------------------------------------
Nokolexbor at_css 84.346k (± 0.8%) i/s - 1.690M in 20.039873s
Nokogiri at_css 139.870 (± 0.0%) i/s - 2.808k in 20.076093s
Comparison:
Nokolexbor at_css: 84346.4 i/s
Nokogiri at_css: 139.9 i/s - 603.04x (± 0.00) slower
Warming up --------------------------------------
Nokolexbor css 1.019k i/100ms
Nokogiri css 14.000 i/100ms
Calculating -------------------------------------
Nokolexbor css 10.116k (± 1.1%) i/s - 202.781k in 20.047377s
Nokogiri css 139.903 (± 0.0%) i/s - 2.800k in 20.014070s
Comparison:
Nokolexbor css: 10116.3 i/s
Nokogiri css: 139.9 i/s - 72.31x (± 0.00) slower
Lexbor at 9677d13
Warming up --------------------------------------
Nokolexbor parse (367 KB)
102.000 i/100ms
Nokogiri parse (367 KB)
19.000 i/100ms
Calculating -------------------------------------
Nokolexbor parse (367 KB)
1.029k (± 1.0%) i/s - 20.604k in 20.030938s
Nokogiri parse (367 KB)
211.331 (±12.3%) i/s - 4.161k in 20.066281s
Comparison:
Nokolexbor parse (367 KB): 1028.7 i/s
Nokogiri parse (367 KB): 211.3 i/s - 4.87x (± 0.00) slower
Warming up --------------------------------------
Nokolexbor parse (1100 B)
6.654k i/100ms
Nokogiri parse (1100 B)
2.969k i/100ms
Calculating -------------------------------------
Nokolexbor parse (1100 B)
65.730k (± 0.7%) i/s - 1.317M in 20.044978s
Nokogiri parse (1100 B)
28.221k (± 3.8%) i/s - 564.110k in 20.018026s
Comparison:
Nokolexbor parse (1100 B): 65730.1 i/s
Nokogiri parse (1100 B): 28220.7 i/s - 2.33x (± 0.00) slower
Warming up --------------------------------------
Nokolexbor at_css 4.984k i/100ms
Nokogiri at_css 13.000 i/100ms
Calculating -------------------------------------
Nokolexbor at_css 49.912k (± 0.5%) i/s - 1.002M in 20.071530s
Nokogiri at_css 132.617 (± 0.8%) i/s - 2.665k in 20.095940s
Comparison:
Nokolexbor at_css: 49911.8 i/s
Nokogiri at_css: 132.6 i/s - 376.36x (± 0.00) slower
Warming up --------------------------------------
Nokolexbor css 806.000 i/100ms
Nokogiri css 13.000 i/100ms
Calculating -------------------------------------
Nokolexbor css 8.011k (± 1.3%) i/s - 160.394k in 20.026210s
Nokogiri css 132.339 (± 0.8%) i/s - 2.652k in 20.039860s
Comparison:
Nokolexbor css: 8010.6 i/s
Nokogiri css: 132.3 i/s - 60.53x (± 0.00) slower
The newer version shows a small improvement in parsing but a big downgrade in selecting (at_css and css). Actually, the downgrade was introduced since this commit https://github.com/lexbor/lexbor/commit/9677d13321faa00e9ecf85ee2830d305dd21100f. I think CSS parsing is slowing down the whole selecting process.
Is there something I can do to recover the performance?
Hi @zyc9012
Okay, I'll take a look at it. It's weird that the parser slowed down, it didn't seem to change anything there.