Multiple selectors for direct descendants catches indirect descendants as well
Using org.jsoup:jsoup:1.14.3, it seems like using something like .select("> .direct > .foo, > .direct > .bar") will also select .direct > .bar.
As a work-around: .selectFirst("> .direct")!!.select("> .foo, > .bar") seems to work fine.
package bug
import org.intellij.lang.annotations.*
import org.jsoup.*
import org.junit.*
import org.junit.Assert.*
class JsoupLearningTests {
@Test
fun direct_descendant_bug_1() { // Fails.
@Language("HTML")
val html = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
</head>
<body>
<div class="entry">
<div class="entry__header">
<div class="interesting-container">
<span class="interesting-item">Y</span>
<span class="also-interesting-item">Y</span>
</div>
</div>
<div class="entry__body">
<p> ... </p>
<p> ... </p>
<div class="sub-entry entry">
<div class="entry__header">
<div class="interesting-container">
<span class="interesting-item">N</span>
<span class="also-interesting-item">N</span>
</div>
</div>
<div class="entry__body">
<p> ... </p>
<p> ... </p>
</div>
</div>
</div>
</div>
</body>
</html>
"""
val document = Jsoup.parse(html)
val entry = document.selectFirst(".entry")!!
val interestingItems = entry.select("> .entry__header > .interesting-container > .interesting-item, > .entry__header > .interesting-container > .also-interesting-item")
val actual = interestingItems.joinToString("") { it.text() }
assertEquals("YY", actual)
}
@Test
fun direct_descendant_bug_2() { // Passes.
@Language("HTML")
val html = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
</head>
<body>
<div class="entry">
<div class="entry__header">
<div class="interesting-container">
<span class="interesting-item">Y</span>
<span class="interesting-item">Y</span>
</div>
</div>
<div class="entry__body">
<p> ... </p>
<p> ... </p>
<div class="sub-entry entry">
<div class="entry__header">
<div class="interesting-container">
<span class="interesting-item">N</span>
<span class="interesting-item">N</span>
</div>
</div>
<div class="entry__body">
<p> ... </p>
<p> ... </p>
</div>
</div>
</div>
</div>
</body>
</html>
"""
val document = Jsoup.parse(html)
val entry = document.selectFirst(".entry")!!
val interestingItems = entry.select("> .entry__header > .interesting-container > .interesting-item")
val actual = interestingItems.joinToString("") { it.text() }
assertEquals("YY", actual)
}
@Test
fun direct_descendant_bug_3() { // Passes.
@Language("HTML")
val html = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
</head>
<body>
<div class="entry">
<div class="entry__header">
<div class="interesting-container">
<span class="interesting-item">Y</span>
<span class="also-interesting-item">Y</span>
</div>
</div>
<div class="entry__body">
<p> ... </p>
<p> ... </p>
<div class="sub-entry entry">
<div class="entry__header">
<div class="interesting-container">
<span class="also-interesting-item">N</span>
</div>
</div>
<div class="entry__body">
<p> ... </p>
<p> ... </p>
</div>
</div>
</div>
</div>
</body>
</html>
"""
val document = Jsoup.parse(html)
val entry = document.selectFirst(".entry")!!
val interestingItems = entry.select("> .entry__header > .interesting-container > .also-interesting-item, > .entry__header > .interesting-container > .interesting-item")
val actual = interestingItems.joinToString("") { it.text() }
assertEquals("YY", actual)
}
@Test
fun direct_descendant_bug_4() { // Fails.
@Language("HTML")
val html = """
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
</head>
<body>
<div class="entry">
<div class="entry__header">
<div class="interesting-container">
<span class="interesting-item">Y</span>
<span class="also-interesting-item">Y</span>
</div>
</div>
<div class="entry__body">
<p> ... </p>
<p> ... </p>
<div class="sub-entry entry">
<div class="entry__header">
<div class="interesting-container">
<span class="also-interesting-item">N</span>
</div>
</div>
<div class="entry__body">
<p> ... </p>
<p> ... </p>
</div>
</div>
</div>
</div>
</body>
</html>
"""
val document = Jsoup.parse(html)
val entry = document.selectFirst(".entry")!!
val interestingItems = entry.select("> .entry__header > .interesting-container > .interesting-item, > .entry__header > .interesting-container > .also-interesting-item")
val actual = interestingItems.joinToString("") { it.text() }
assertEquals("YY", actual)
}
}
Not sure if it's a bug or a feature: in comparison, JS's .querySelectorAll(> .direct) throws about an invalid selector.
I along with my group will be fixing this issue in this semester.
Hi, I may just find the problem.
When dealing with multiple subqueries. The method consumeSubQuery will ignore the '>' of the next subquery, which means the second subquery will become like .select("> .direct > .foo") and .select(".direct > .bar") instead of the one we want like .select("> .direct > .foo") and .select("> .direct > .bar").
Hence, my method is to judge if the next is a subquery and if so, add the '>' back to the query.
Thanks, fixed!
Not sure if it's a bug or a feature: in comparison, JS's .querySelectorAll(> .direct) throws about an invalid selector.
In jsoup, if the query starts with a combinator, we combine it against the root element. The root element is the Document or the context element.