PHP-CSS-Parser icon indicating copy to clipboard operation
PHP-CSS-Parser copied to clipboard

Validate name-start code points for identifier

Open pierlon opened this issue 5 years ago • 7 comments

When given the following CSS:

body {
    transition: all .3s ease-in-out;
    -webkit-transition: all .3s ease-in-out;
    -moz-transition: all .3s ease-in-out;
    -0-transition: all .3s ease-in-out;
}

The parser does not remove or throw an exception for the malformed rule -0-transition, as it is a syntax error:

image

This occurs because the check to validate if the rule is an identifier does not validate if the first 3 code points are valid.


This PR adds a check to validate the first 3 code points in an identifier. If a malformed identifier is given, an UnexpectedTokenException exception will be thrown before the identifier is consumed.

pierlon avatar Jan 22 '20 06:01 pierlon

@sabberworm How's this look?

westonruter avatar Jan 30 '20 17:01 westonruter

This doesn't seem to recognize some escape sequences. For example:

body { background: height: ca\lc(100% - 1px); }

The proposed changes cause the background rule here to get dropped.

raxbg avatar Jan 31 '20 05:01 raxbg

The CSS I meant to post in my previous message was actually:

body { height: ca\lc(100% - 1px); }

And the dropped rule is the height.

raxbg avatar Jan 31 '20 07:01 raxbg

I would assume that should be the intended behavior. A UnexpectedTokenException would be thrown with the error message: Identifier expected. Got “- 1px”. This is inline with the W3C CSS validator:

image

pierlon avatar Jan 31 '20 08:01 pierlon

It looks like this will introduce breaking changes (at least in lenient mode). I am not fully on board with the idea of running this kind of validation while parsing, at least not without introducing something like a new parsing mode.

The syntax of the example calc should be valid according to this https://developer.mozilla.org/en-US/docs/Web/CSS/calc and is also properly recognized by the browsers.

The proposed validation can also be done post parse. Are there cases where doing it while parsing is truly necessary? It would definitely be nice to have this validation as part of the parser, so maybe it can be implemented as a method of Document? In fact a validation method in Document can be quite useful and more flexible.

raxbg avatar Jan 31 '20 09:01 raxbg

I agree with @raxbg. We throw errors when we can‘t make sense of the input (and we try to recover in lenient mode). But throwing errors when we can understand the structure and it just happens to be invalid is a different thing entirely.

Yes, a validation after parsing would be cool though. It could also catch tons of other issues. There‘s lots of CSS that‘s syntactically valid but semantically incorrect. I would not put it on the Document class though. I‘ve put too many extraneous stuff on the main classes already. It should be a separate Validator class (or even a distinct repo/composer package).

sabberworm avatar Jan 31 '20 09:01 sabberworm

It looks like this will introduce breaking changes (at least in lenient mode).

Yes that would the case. Adding a check for lenient mode would resolve that and allow the parser to recover:

--- lib/Sabberworm/CSS/Parsing/ParserState.php	(revision 8fbd0fe82aa08ad2650def1b44f2f77154211e30)
+++ lib/Sabberworm/CSS/Parsing/ParserState.php	(date 1580502992173)
@@ -72,7 +72,7 @@
             $sResult = $this->parseCharacter(true);
         }
 
-		if ($sResult === null) {
+		if (!$this->oParserSettings->bLenientParsing && $sResult === null) {
 			throw new UnexpectedTokenException($sResult, $this->peek(5), 'identifier', $this->iLineNo);
 		}
 		$sCharacter = null;

Validation of the CSS after parsing would be ideal, but that would be for another PR.

pierlon avatar Jan 31 '20 21:01 pierlon