PHP-CSS-Parser
PHP-CSS-Parser copied to clipboard
Validate name-start code points for identifier
When given the following CSS:
body {
transition: all .3s ease-in-out;
-webkit-transition: all .3s ease-in-out;
-moz-transition: all .3s ease-in-out;
-0-transition: all .3s ease-in-out;
}
The parser does not remove or throw an exception for the malformed rule -0-transition
, as it is a syntax error:
This occurs because the check to validate if the rule is an identifier does not validate if the first 3 code points are valid.
This PR adds a check to validate the first 3 code points in an identifier. If a malformed identifier is given, an UnexpectedTokenException
exception will be thrown before the identifier is consumed.
@sabberworm How's this look?
This doesn't seem to recognize some escape sequences. For example:
body { background: height: ca\lc(100% - 1px); }
The proposed changes cause the background rule here to get dropped.
The CSS I meant to post in my previous message was actually:
body { height: ca\lc(100% - 1px); }
And the dropped rule is the height
.
I would assume that should be the intended behavior. A UnexpectedTokenException
would be thrown with the error message: Identifier expected. Got “- 1px”
. This is inline with the W3C CSS validator:
It looks like this will introduce breaking changes (at least in lenient mode). I am not fully on board with the idea of running this kind of validation while parsing, at least not without introducing something like a new parsing mode.
The syntax of the example calc
should be valid according to this https://developer.mozilla.org/en-US/docs/Web/CSS/calc and is also properly recognized by the browsers.
The proposed validation can also be done post parse. Are there cases where doing it while parsing is truly necessary? It would definitely be nice to have this validation as part of the parser, so maybe it can be implemented as a method of Document
? In fact a validation method in Document
can be quite useful and more flexible.
I agree with @raxbg. We throw errors when we can‘t make sense of the input (and we try to recover in lenient mode). But throwing errors when we can understand the structure and it just happens to be invalid is a different thing entirely.
Yes, a validation after parsing would be cool though. It could also catch tons of other issues. There‘s lots of CSS that‘s syntactically valid but semantically incorrect. I would not put it on the Document class though. I‘ve put too many extraneous stuff on the main classes already. It should be a separate Validator
class (or even a distinct repo/composer package).
It looks like this will introduce breaking changes (at least in lenient mode).
Yes that would the case. Adding a check for lenient mode would resolve that and allow the parser to recover:
--- lib/Sabberworm/CSS/Parsing/ParserState.php (revision 8fbd0fe82aa08ad2650def1b44f2f77154211e30)
+++ lib/Sabberworm/CSS/Parsing/ParserState.php (date 1580502992173)
@@ -72,7 +72,7 @@
$sResult = $this->parseCharacter(true);
}
- if ($sResult === null) {
+ if (!$this->oParserSettings->bLenientParsing && $sResult === null) {
throw new UnexpectedTokenException($sResult, $this->peek(5), 'identifier', $this->iLineNo);
}
$sCharacter = null;
Validation of the CSS after parsing would be ideal, but that would be for another PR.