tinycss2 icon indicating copy to clipboard operation
tinycss2 copied to clipboard

Correctly serialize the <an+b> type

Open gherkz opened this issue 4 years ago • 2 comments

I'm not 100% sure what the extra empty comments are for in the serializer (something about BAD_PAIRS it seems, whatever they are): https://github.com/Kozea/tinycss2/blob/0268a6076040a7068c41c38ff513e07d754fc83d/tinycss2/serializer.py#L111

These seem to get added in the following sample:

tinycss2.serialize(tinycss2.parse_stylesheet('div:nth-child(2n+1){}'))

Which results in

div:nth-child(2n/**/+1){}

This is clearly not the desired output. It would be nice if this feature was disabled by default. It could probably even be removed altogether - I can't imagine anyone is relying on these empty comments for anything.

gherkz avatar Jun 01 '21 14:06 gherkz

Hi!

This extra empty comment is actually required by the specification, when we find two tokens that shouldn’t be next to each other (that’s what bad pairs are). So, we won’t remove these extra comments added by the serializer :wink:.

But.

I don’t know why 2n+1 is a bad pair. Maybe the "problem" is here, I have to check that more deeply.

liZe avatar Jun 01 '21 19:06 liZe

I don’t know why 2n+1 is a bad pair. Maybe the "problem" is here, I have to check that more deeply.

That’s because the an+b syntax used to be described with a dedicated tokenizer. But an <an+b> type is now available, tinycss2 should use its dedicated serialization.

liZe avatar Jun 13 '21 06:06 liZe

After checking this issue more deeply, I think that there’s actually no problem here.

The only requirement for serialization is that it must "round-trip" with parsing, that is, parsing the stylesheet must produce the same data structures as parsing, serializing, and parsing again, except for consecutive s, which may be collapsed into a single token.

The syntax serializer’s goal is to provide a serialization that will give (almost) the same result when parsed again.

This specification does not define how to serialize CSS in general, leaving that task to the [CSSOM] and individual feature specifications. In particular, the serialization of comments and whitespace is not defined.

The serialization rules provided by CSS Syntax Module are not aware of the CSS grammar (that may be different depending on the project using tinycss2), only of the syntax. The "real" serializer has to be aware of the grammar, and is not covered by the Nesting Module (and thus by tinycss2).

So:

  • the tinycss2 serializer takes care only of the syntax, not the grammar
  • the tinycss2 serializer’s goal is only to provide a serialization that can "round-trip", not to serialize CSS in general
  • the "real" CSS serializer is defined in CSSOM, that knows the grammar and can use specific serialization rules for specific cases
  • the an+b type is defined in CSS Syntax because it’s a quite complicated list of possible tokens, that can be used by different grammars (such as CSS Selectors)
  • CSS Syntax provides a serialization of an+b for grammars that use this type (i.e. for CSSOM’s serialization of selectors, that uses it when it meets :nth-child() for example).

Getting the "right" serialization requires to build a CSSOM implementation, that’s not in tynycss2’s scope (but that can use it). We’ll keep this comment that’s required by CSS Syntax serialization rules but doesn’t break anything if it’s parsed again.

liZe avatar Feb 28 '24 15:02 liZe