jsoup icon indicating copy to clipboard operation
jsoup copied to clipboard

`SelectorParseException` when calling `Element#cssSelector()`

Open remi-sf opened this issue 2 years ago • 1 comments

Hi,

My team have encountered this crash trying to blindly call Element#cssSelector() on elements.

The signature is:

org.jsoup.select.Selector$SelectorParseException: Could not parse query 'ul.sp-c-sport-flyout__inner.gs-u-mb\': unexpected token at '\'

	at org.jsoup.select.QueryParser.findElements(QueryParser.java:226)
	at org.jsoup.select.QueryParser.parse(QueryParser.java:74)
	at org.jsoup.select.QueryParser.parse(QueryParser.java:45)
	at org.jsoup.select.QueryParser.combinator(QueryParser.java:90)
	at org.jsoup.select.QueryParser.parse(QueryParser.java:60)
	at org.jsoup.select.QueryParser.parse(QueryParser.java:45)
	at org.jsoup.select.Selector.select(Selector.java:98)
	at org.jsoup.nodes.Element.select(Element.java:418)
	at org.jsoup.nodes.Element.cssSelector(Element.java:858)

To reproduce this, run the following test case:

void test() throws IOException
    {
        final String html = "<ul class=\"sp-c-sport-flyout__inner gs-u-mb+ gs-u-display-none@m qa-flyout-primary\"><li class=\"sp-c-sport-flyout__item \" role=\"presentation\"><a class=\"sp-c-sport-flyout__link qa-flyout-primary-item sp-nav-click-stat\" role=\"menuitem\" data-stat-name=\"primary-nav-v2-mobile\" data-stat-title=\"Home\" data-stat-link=\"/sport\" href=\"/sport\">Home</a></li></ul>";
        final Document document = Jsoup.parse(html);
        document.getElementsByTag("ul").get(0).cssSelector();
    }

The class gb-u-mb+ is causing the crash, and removing it from the HTML avoids the crash. I suppose the + character is invalid for a CSS class? In which case, this might not really be a bug and we'll just have to handle the runtime exception in our application.

The HTML comes from the web page in the attached archive: Transfer news live & West Ham in Europa Conference League final - Live - BBC Sport.html.zip

(Reproduced in JSoup 1.15.4)

remi-sf avatar Jun 08 '23 09:06 remi-sf

I agree, similar issue when parsing "td:first-child" in testing environment but in production anything is fine!

erfansn avatar Jan 27 '24 06:01 erfansn