webref icon indicating copy to clipboard operation
webref copied to clipboard

Group CSS features (and `@webref/css` v7 alpha release)

Open nzakas opened this issue 7 months ago • 14 comments

One of the valuable parts of the mdn-data package is how it separates CSS features into different categories:

  • At-rules
  • functions
  • properties
  • selectors
  • syntaxes
  • types
  • units

In the current webref package, it's just a collection of objects that we then need to dig into to figure out what types are contained within. It would be helpful if the categories could be exposed at the top level of the package and list every entry for that category regardless of spec.

nzakas avatar Apr 08 '25 18:04 nzakas

I note the current @webref/css package already separates at the root level between:

  • at-rules
  • properties
  • selectors
  • and "values", which is a mixed bag of things.

The mixed bag of things exists because CSS specs do not really distinguish between other types when they define concepts. There is a notion of function but the specs do not necessarily use that consistently. That ambiguity seems to appear in mdn-data too. For example, the abs() function appears both as a "function" and as a "syntax" in mdn-data.

CSS specs do use a type definition type too, which could perhaps be used to populate a related category. There seems to be many more type definitions in specs than in what mdn-data currently lists as types. For example, line-color-list, linear-color-stop, ident-token are all type definitions from a spec perspective. If they are not in the list on purpose, is there a way to distinguish between types?

CSS specs define units as value definitions that are for something. It may be relatively easy to assemble the list of units automatically with a short list of underlyling types. For example, looking at all values defined for <angle>, <length> and a few others.

Essentially, the question is: can CSS features be categorized automatically? If not, what amount of manual data would need to be maintained?

tidoust avatar Apr 09 '25 10:04 tidoust

Thanks for the response. A follow-up question: assuming everyone wants webref packages to be as useful as possible, is there a reason the specs themselves can't be updated to encode this information where appropriate?

nzakas avatar Apr 09 '25 17:04 nzakas

No reason in theory and, on top of trying to reduce the amount of work needed to maintain Webref, we also restrict the amount of data that needs to be manually injected in Webref to a bare minimum as a way to push fixes and improvements back to the underlying specs.

In practice there are ~120 CSS specs at various levels of maturity and activity, with dozens of editors and >3800 open issues. We already maintain a few patches in Webref for things that need fixing in CSS specs to get consistent data (these patches link back to issues raised against the specs). If most CSS specs need to be updated to provide additional semantics, that's likely going to require elbow grease both to convince CSS WG participants that the effort is worth prioritizing and to help with the actual updates. That's also why I'm trying to assess whether missing categories can already be determined automatically from available information.

tidoust avatar Apr 10 '25 10:04 tidoust

Ah gotcha, thanks for explaining. :+1:

nzakas avatar Apr 10 '25 15:04 nzakas

I explored a bit the differences between MDN data and Webref, see underlying code in tidoust/mdn-webref, along with the results:

  1. The webref.json file, which could represent what we may want to end up with in Webref to ease consumption of data.
  2. The report, which highlights differences between the two projects.

As far as I can tell, missing data in Webref is mostly stuff that is non standard or that has been obsoleted, but that is still present in MDN data (and sometimes documented on MDN). I do not know to what extent that data is a must have in Webref. There's more data missing in MDN data, perhaps because the underlying features are more recent and not yet documented.

There may be a few cases where data needs to be slightly improved in specs so that it can start appearing in Webref. One example is <general-enclosed> which is currently defined in a <pre> tag without any class, skipped by the crawler as too generic. That seems easily fixable.

I still do not understand what syntaxes are meant to encompass. I managed to cover most of them by assembling functions and types, but that also creates hundreds of syntaxes that are not accounted for in MDN data. Are syntaxes used in practice? How?

(On top of the features themselves, I note that the grouping information in MDN data does not exist in Webref. That grouping seems more specific to MDN though. Same thing for links to MDN pages).

tidoust avatar Apr 28 '25 15:04 tidoust

Syntaxes are used in CSSTree to enable validation: https://github.com/csstree/csstree/blob/9558ba790daeda2b24935838bf89990699ece66e/lib/data.js#L7

Basically, the parser creates an AST and the lexer validates the AST against these syntax definitions.

nzakas avatar Apr 28 '25 15:04 nzakas

Thanks @nzakas. I had not realized that entries in the "types" category in MDN data do not have a syntax key and that the "syntaxes" category collects that information. I'm not sure why functions are listed under the "syntaxes" category too, as that seems to duplicate the information already present in the functions.json file. All in all, I think the "syntaxes" category can be assembled by merging the "functions" and "types" categories, provided entries there do have a syntax key of course.

That initial exploration suggests that the categorization itself can be done automatically, with straightforward reasons that explain why some data is missing in Webref. That's a good first result!

I'll now look into actual syntax values to understand where and why Webref differs from MDN data. I somewhat expect to find more substantive differences as MDN data syntaxes are manually curated to match reality in main browsers if I understand things correctly, while Webref data is more meant to be a view of what latest specs drafts currently define, regardless of what browsers support. When specs lag behind implementations, they need fixing, knowing about the problem creates a good feedback loop. When specs are more recent than implementations, it may be challenging to select the right syntax automatically. Anyway, let's find out ;)

tidoust avatar May 01 '25 10:05 tidoust

Thanks for the update and all of our work on this. :pray:

nzakas avatar May 01 '25 15:05 nzakas

A new major version of @webref/css was (pre-)released today: v7.0.0-alpha.

This new version features the consolidated CSS file css.json, returned by the listAll() function, that lists CSS features grouped by feature type: atrules, functions, properties, selectors and types. The package no longer contains CSS extracts per spec.

The version was flagged as alpha because we're looking for feedback from consumers on whether the contents and approach more or less align with what everyone needs, and we may adjust the package accordingly. We will continue to release updates to version 6 of the package in the meantime.

If the approach does not work for you, what potential improvements would make it easier to consume the data?

Consolidation cannot do miracles:

  • As before, syntaxes are reported as found in CSS specs. So only features developed as part of a standardization effort, and syntax only set when the spec explicitly defines it. Updating specs to make syntaxes more explicit where possible is doable. That should typically take the form of pull requests in the CSS specs repository.
  • When a feature is defined in more than one level, the definition from the more advanced level is used. That is, consolidation "lives on the edge". The syntax may not match what ships in browsers yet as a result.

The tidoust/mdn-webref repository reports on differences between the consolidated file and data in MDN data. It also looks at most CSS patches in CSSTree. As far as I can tell, most differences get explained by a combination of the reasons mentioned above: syntax not defined in specs, live on the edge approach, no proprietary, deprecated or proprietary features in Webref. MDN data is not as exhaustive as data in Webref, possibly because underlying features are not yet documented in MDN.

tidoust avatar Jun 10 '25 12:06 tidoust

Thanks! I'll set aside some time to play with this and see how it goes.

cc @rviscomi

nzakas avatar Jun 10 '25 15:06 nzakas

Early bit of feedback: While the correct data is present, webref/css returns each group as an array of objects, such as:

{
    "functions": [
      {
        "name": "-webkit-image-set()",
        "prose": "Implementations must accept -webkit-image-set() as a parse-time alias of image-set(). (It’s a valid value, with identical arguments to image-set(), and is turned into image-set() during parsing.)",
        "href": "https://drafts.csswg.org/css-images-4/#funcdef--webkit-image-set",
        "type": "function"
      }
  ]
}

However, mdn-data returns each group as object literals, such as:

{
  "functions": {

    "abs()": {

      "syntax": "abs( <calc-sum> )",

      "groups": [

        "CSS Values and Units"

      ],

      "status": "standard",

      "mdn_url": "https://developer.mozilla.org/docs/Web/CSS/abs"

    }
  }
}

This difference means that @webref/css can't be used as a drop-in replacement for mdn-data.

For what we're doing, the object literal representation is much easier to work with because we do a lot of random lookups.

nzakas avatar Jun 10 '25 18:06 nzakas

Webref tends to use arrays more than indexed objects. Would adding an index() function that returns the indexed version be enough or do you want to leverage the JSON file directly?

One specific problem is that mdn-data assumes all features are defined unscoped. That's not entirely true in practice. Some functions and types are defined scoped to one or more other constructs. The scoping constructs may be a property, a selector, a function, or a type.

For example, there is no unscoped fit-content() function definition; there is a fit-content() function scoped to grid-template-rows and grid-template-columns, and another one scoped to height, width, and their min and max variants. Their syntax is not exactly the same for now (that could be a bug in either spec, not sure).

That's an exceptional case. We're talking about 89 scoped entries out of ~1700. I'll need to look into details, there may be a way to eliminate scoping, or at least to end up in a situation where a name always has the same syntax definition across scopes. One practical usage for scoping, is that a function may only be usable in certain contexts. For example, an IDE might want to check the scope before it proposes autocomplete options.

Back to the data, when a name is defined for multiple scopes, the array contains an entry per scope (and each entry has a for key with the name of the scoping feature). In the indexed object, we could either:

  1. Use the name as lookup key and let consumers check the for key afterwards. To list all scopes that the entry is defined for, that for key would be turned into an array.
  2. Use something like [for]/[name] as lookup key, but consumers would need to adjust their lookup logic.

First option seems easier for consumers, but does not work for cases where more than one syntax exists as in the fit-content() example. I don't know yet how many other cases exist. Second option is so-so from a lookup perspective, as there's not good way to tell when scoping needs to be considered.

tidoust avatar Jun 11 '25 09:06 tidoust

An update, looking into the data. Out of the 89 features that are scoped to other features:

  1. 57 are defined with a single scope. No problem with them.
  2. 24 are defined once but for multiple scopes. No problem either if we turn for into an array to list the different scopes.
  3. 8 are defined for different scopes in different specs.

For 3., entries are:

The last 3 ones are a bug in css-transforms-2, as it defines the functions as unscoped whereas css-transforms-1 defines them with a scope. Easy to fix in the spec and/or in the consolidation logic. More or less the same thing for <content-list>, the entry in css-content-3 could be caught during curation. First 4 functions are more problematic. At least for some of them (repeat(), type()), function syntaxes seem to differ depending on the context on purpose.

tidoust avatar Jun 11 '25 10:06 tidoust

Ah that's interesting, I didn't realize that functions could have defined scopes (or multiple scopes).

For our use case, we generally just care that something exists as we check "is this a function?".

Having another function to call to get the data formatted in the same manner as mdn-data would be helpful, as it would mean people could have a drop-in replacement for that package even if the format lacks the granular detail you're providing otherwise. CSSTree itself expects things in mdn-data format, so providing a compatible format out of the box would also allow easier experimentation there.

nzakas avatar Jun 11 '25 14:06 nzakas

New alpha version 7.0.2-alpha features an index() function that returns an indexed object. The structure was further aligned with mdn/data. Remaining changes are described in a Migrating from mdn/data section in the README.

As mentioned above, there remain a few entries with a "clunky" lookup name in the indexed object due to the existence of multiple scopes with different syntaxes:

  • fit-content() for grid-template-columns
  • fit-content() for height
  • rect() for <basic-shape>
  • rect() for clip
  • repeat() for <auto-repeat-line-color>
  • repeat() for <auto-repeat>
  • scale() for transform
  • scaleX() for transform
  • scaleY() for transform
  • type() for @function
  • type() for attr()
  • type() for image-set()
  • content-list for content

tidoust avatar Jun 20 '25 18:06 tidoust

Please assume that I think or in my opinion is placed in front of the sentences below.

I have a different interpretation of functions and contexts. Let's take this hypothetical spec text as an example:

Name: property Value: fit-content(foo) | <fit-content()> | fit-content

Definitions for the property property:

  • fit-content(argument): ...
  • <fit-content()>: ...
  • fit-content: ...

<fit-content()> = fit-content(bar | baz)

The for attribute on the <dfn> element of <fit-content()> is irrelevant (at the grammar level) because functional types have a unique argument grammar, like other productions.

Inline functions are functional values. They can take different argument grammars.

The function value for the type attribute of the <dfn> element of fit-content(argument) only exists to avoid a clash with fit-content (keyword value).


What is the reason for keeping a type field for functions and types?

At-rules, properties, descriptors, selectors, do not have one. You can recognize the type of @rule, <type>, :pseudo, fn(), from its name. Otherwise, it is a property or descriptor. I would just drop it, and keep type names wrapped in < and >.


You should not "extend" the syntax of at-rules. Many at-rules take descriptors that are not defined in definition tables so you are missing them. And it would be very unlikely that a CSS parser would ever use it in its implementation.

Nit: its value is incorrect because an author can specify multiple declarations of the same descriptor (the + multiplier is missing).


I think the syntax of legacy property aliases is copied from from the syntax of the target property. But it does not apply for mapped properties (which have no syntax).

Note that I skip both of them anyway when curating w3c/webref data. I prefer not to duplicate data and instead resolve these property value definitions while validating a declaration name in the context.

cdoublev avatar Jul 07 '25 04:07 cdoublev

Thanks for the input, @cdoublev!

I have a different interpretation of functions and contexts. Let's take this hypothetical spec text as an example:

Name: property Value: fit-content(foo) | <fit-content()> | fit-content Definitions for the property property:

  • fit-content(argument): ...
  • <fit-content()>: ...
  • fit-content: ...

<fit-content()> = fit-content(bar | baz)

The for attribute on the <dfn> element of <fit-content()> is irrelevant (at the grammar level) because functional types have a unique argument grammar, like other productions.

Inline functions are functional values. They can take different argument grammars.

The function value for the type attribute of the <dfn> element of fit-content(argument) only exists to avoid a clash with fit-content (keyword value).

In practice, there are two sets of "function" definitions for fit-content():

  • fit-content( <length-percentage> ) for grid-template-columns and grid-template-rows
  • fit-content( <length-percentage [0,∞]> ) for height et al.

... and perhaps more interestingly for rect():

  • rect( [ <length-percentage> | auto ]{4} [ round <'border-radius'> ]? ) for <basic-shape>
  • rect( <top>, <right>, <bottom>, <left> ) for clip

I'm not sure if you're saying:

  • These entries should not appear in the list of functions because they are not functional types. But some tools still want to know that e.g., <basic-shape> can take a rect() function with a specific syntax.
  • These entries should be merged into one. That matches the approach in mdn/data but then I do not know what syntax to give the merged entry.
  • Something else.

What is the reason for keeping a type field for functions and types?

No specific reason, it just comes from the initial data. I'll drop it, the entries are already categorized by their type field.

At-rules, properties, descriptors, selectors, do not have one. You can recognize the type of @rule, <type>, :pseudo, fn(), from its name. Otherwise, it is a property or descriptor. I would just drop it, and keep type names wrapped in < and >.

I dropped the wrapping <> for types to better align with mdn/data and ease transition to Webref for mdn/data consumers.

You should not "extend" the syntax of at-rules. Many at-rules take descriptors that are not defined in definition tables so you are missing them. And it would be very unlikely that a CSS parser would ever use it in its implementation.

Any example of descriptors that are missing? I do not know how tools that leverage mdn/data use the data. Some of at-rules there have a more expanded syntax than the one defined in specs. I'm fine leaving the expansion up to consuming tools in any case.

Nit: its value is incorrect because an author can specify multiple declarations of the same descriptor (the + multiplier is missing).

Good point :)

I think the syntax of legacy property aliases is copied from from the syntax of the target property. But it does not apply for mapped properties (which have no syntax).

I believe I only copy the syntax for properties that are explicitly defined as legacy aliases, and not for mapped properties. The -webkit-box-* properties do not have any syntax in the consolidated file in particular. If that's not the case, that may be a bug ;)

tidoust avatar Jul 07 '25 12:07 tidoust

I'm not sure if you're saying:

Let's put clip aside because it is legacy. I would define its value with rect( [ <length> | auto ]{4} | [ <length> | auto ]#{4} ).

I intentionally did not give you a specific type organization because I do not know which one would meet everyone's needs and whether it can/should deviate from the type organization in Bikeshed, which I do not really understand to be honest...

  • <<inferred-function()>> and ''inferred-function()'' links to <dfn>inferred-function()</dfn>
  • <<explicit-type()>> does not link to <dfn type><<explicit-type()>>

So it clearly wants <foo()> to be a function. But its documentation also says:

  • Is it surrounded by <>? Then it’s a type.
  • Does it end with ()? Then it’s a function.

Anyway... I do not see the point of function. <rgb()> and <dashed-function> are just types producing a function value. Productions are types. foo() = foo(...) is a production rule and should be equivalent to <foo()> = foo(...), similarly as @rule = @rule { ... } should be equivalent to <@rule> = @rule { ... }.

That said, you can differentiate functional types from types if you like.

When a CSS parser processes <foo()> in the RHS of a production rule, it will look for a corresponding entry in functions, regardless of any context defined for this entry. It will never look for fit-content(), since its syntax is inlined in the context syntax.


Any example of descriptors that are missing?

Custom properties in @function. Arbitrary <ident> descriptors in the rules nested in @font-feature-values. And all descriptors accepted in @page and margin rules bearing the same name and syntax than a property.

cdoublev avatar Jul 08 '25 06:07 cdoublev

When a CSS parser processes <foo()> in the RHS of a production rule, it will look for a corresponding entry in functions, regardless of any context defined for this entry.

And as long as CSS specs continue to use scoping, maybe it should? It seems that nothing currently breaks if you ignore scopes. That is good. I still need to handle the scoping somehow in the data, in particular to capture the 8 cases where constructs are defined with different syntaxes for different scopes.

Any example of descriptors that are missing?

Custom properties in @function. Arbitrary <ident> descriptors in the rules nested in @font-feature-values. And all descriptors accepted in @page and margin rules bearing the same name and syntax than a property.

Thanks for the examples! I haven't tried to expand syntaxes of nested rules, so @font-feature-values does not seem wrong per se for now. In any case, first example and third example clearly show the limits of the approach: specs typically describe what can go inside a <declaration-rule-list> as prose, we cannot assume that the list of formally defined descriptors is going to be exhaustive.

(For what it's worth, definitions of margin at-rules should probably be scoped to @page in the spec so that they get attached to it during extraction, but that wouldn't fix missing page properties in any case).

@nzakas, I had expanded the syntax of at-rules because mdn/data sometimes expanded it as well but that seems prone to errors. I'm tempted to get back to the formal non-expanded syntax. Do you know if the expanded form is useful for a specific purpose?

tidoust avatar Jul 16 '25 12:07 tidoust

Sorry, I'm not quite sure what counts as "expanded". Can you clarify?

nzakas avatar Jul 21 '25 14:07 nzakas

Sure! For example, the syntax of the @counter-style at-rule is defined in the spec as:

@counter-style <counter-style-name> { <declaration-list> }

The spec defines a number of descriptors for the at-rule. I'm using that to expand <declaration-list>:

@counter-style <counter-style-name> {
  [ system: [ cyclic | numeric | alphabetic | symbolic | additive | [fixed <integer>?] | [ extends <counter-style-name> ] ]; ] ||
  [ negative: [ <symbol> <symbol>? ]; ] ||
  [ prefix: [ <symbol> ]; ] ||
  [ suffix: [ <symbol> ]; ] ||
  [ range: [ [ [ <integer> | infinite ]{2} ]# | auto ]; ] ||
  [ pad: [ <integer [0,∞]> && <symbol> ]; ] ||
  [ fallback: [ <counter-style-name> ]; ] ||
  [ symbols: [ <symbol>+ ]; ] ||
  [ additive-symbols: [ [ <integer [0,∞]> && <symbol> ]# ]; ] ||
  [ speak-as: [ auto | bullets | numbers | words | spell-out | <counter-style-name> ]; ]
}

But @cdoublev notes above that this syntax is already slightly incorrect, that specs tend to define additional descriptors that cannot easily be captured and that, in any case, this expanded information is unlikely going to be directly useful for consumers.

If there's no immediate use for the expanded syntax, instead of introducing a syntax that may prove hard to get right, I'd be happy to roll back to the base syntax definition and let consumers figure out what goes inside the <declaration-list> themselves from the list of descriptors associated with the at-rule.

(Note: mdn/data also seems to use an expanded syntax for at-rules... and seems to miss descriptors so I'm not sure how these syntaxes were computed)

tidoust avatar Jul 21 '25 15:07 tidoust

Gotcha, thanks for explaining. CSSTree does use the extended syntax to build out its lexer.

As you note, the mdn-data is frequently incomplete, so CSSTree needs to additionally apply a patch on top of that data: https://github.com/csstree/csstree/blob/master/data/patch.json

The patch is meant to fill gaps, and without the descriptors entry in mdn-data, CSSTree would need to manually add every descriptor for every at-rule. So in this case, even incomplete descriptors data is preferable to none.

nzakas avatar Jul 21 '25 15:07 nzakas

The patch is meant to fill gaps, and without the descriptors entry in mdn-data, CSSTree would need to manually add every descriptor for every at-rule. So in this case, even incomplete descriptors data is preferable to none.

Data will contain the (possibly incomplete) list of descriptors associated with each at-rule and their syntax. For example, for @color-profile, you'd get:

{
      "name": "@color-profile",
      "href": "https://drafts.csswg.org/css-color-5/#at-ruledef-profile",
      "syntax": "[...]"
      "descriptors": [
        {
          "name": "components",
          "href": "https://drafts.csswg.org/css-color-5/#descdef-color-profile-components",
          "syntax": "<ident>#"
        },
        {
          "name": "rendering-intent",
          "href": "https://drafts.csswg.org/css-color-5/#descdef-color-profile-rendering-intent",
          "syntax": "relative-colorimetric | absolute-colorimetric | perceptual | saturation"
        },
        {
          "name": "src",
          "href": "https://drafts.csswg.org/css-color-5/#descdef-color-profile-src",
          "syntax": "<url>"
        }
      ]
    },

The question is about the syntax of the overall at-rule, which appears as [...] above.

The spec defines it as:

@color-profile [<dashed-ident> | device-cmyk] { <declaration-list> }

Going through the syntax of descriptors, that syntax can be expanded to something like:

@color-profile [<dashed-ident> | device-cmyk] {
  [ src: [ <url> ]; ] ||
  [ rendering-intent: [ relative-colorimetric | absolute-colorimetric | perceptual | saturation ]; ] ||
  [ components: [ <ident># ]; ]
}

Code that I use to produce the expanded syntax is relatively straightforward. It currently looks like:

if (feature.descriptors?.length > 0 &&
    feature.syntax?.match(/{ <declaration-(rule-)?list> }/)) {
  const syntax = feature.descriptors
    .map(desc => desc.name.startsWith('@') ?
      `[ ${desc.syntax} ]+` :
      `[ ${desc.name}: [ ${desc.syntax} ]; ]+`)
    .join(' ||\n  ');
  feature.syntax = feature.syntax.replace(
    /{ <declaration-(rule-)?list> }/,
    '{\n  ' + syntax + '\n}');
}

One potential problem with expanding the syntax of the at-rule in the data directly is that CSSTree and others may continue to need patching, for example if the projects want to support "legacy" or proprietary descriptors that specs don't define. I don't think we'll add these here. The patch data you mentioned also shows another example with @position-try: we may try to add its list of descriptors here but they cannot be extracted automatically from the spec, as they are only implicitly defined as "inset properties, margin properties, sizing properties", etc.

Patching syntax may be harder without knowing where <declaration-list> was.

tidoust avatar Jul 23 '25 06:07 tidoust

I missed the main point about this, which is that <declaration-*-list> are associated to a very explicit procedure to consume the corresponding input contents. These productions and this procedure exist precisely because encoding the corresponding grammar in a syntax would be non-trivial, and because parsing against this syntax would most likely be inefficient in rules accepting many properties/descriptors.

So expanding them is basically an implementation decision specific to CSSTree.

cdoublev avatar Jul 23 '25 10:07 cdoublev

The question is about the syntax of the overall at-rule, which appears as [...] above.

Ah sorry, gotcha. CSSTree only ever uses the prelude piece of the syntax. The declaration list part is managed inside of the parser and the declarators are later used only during validation. Internally, We end up needing to flag at-rules to indicate which expect declaration lists and which expect rules in their body.

nzakas avatar Jul 23 '25 14:07 nzakas

Please let me know if this should be a separate issue.

There is a difference in the <color> syntax between mdn/data and webref/css. The <color> type definition in webref/css does not include <deprecated-color>.

Is this intentional?

Spec: https://drafts.csswg.org/css-color-4/#typedef-deprecated-color

mdn/data: https://github.com/mdn/data/blob/main/css/syntaxes.json#L153

"syntax": "<color-base> | currentColor | <system-color> | <light-dark()> | <deprecated-system-color>"

NOTE: <deprecated-color> is defined as <deprecated-system-color>

webref/css: https://github.com/w3c/webref/blob/curated/ed/css/css-color.json#L55

"value": "<color-base> | currentColor | <system-color>",

Lacks <deprecated-color> We have <deprecated-color> though. https://github.com/w3c/webref/blob/curated/ed/css/css-color.json#L1584

asamuzaK avatar Jul 25 '25 20:07 asamuzaK

Is this intentional?

It is "intentional" but it does not mean that it cannot be improved ;)

As much as possible, the CSS extracts in Webref are created on an automated basis. We maintain a few patches, but our experience is that maintaining these patches takes time, so we try to restrict their usage to temporary hiccups while specs get fixed. One consequence of this approach is that the extracts only contain data that are easy to extract from specs.

Capturing all terms that have a CSS type definition is easy. The CSS extract of css-color-4 CSS contains an entry for <color>, <system-color> and <deprecated-color> as a result. Capturing the syntax of CSS terms is also easy when specs define that syntax explicitly in production rules. For example, the syntax of <color> in the extract is as defined in the spec:

<color> = <color-base> | currentColor | <system-color>

Capturing the syntax of CSS terms that are described in prose is much harder. The <system-color> and <deprecated-color> types do not have a formal syntax definition in the spec. The extract does not contain a syntax for them as a result.

The spec attaches <deprecated-color> to system colors. While it's fine to add <deprecated-system-color> to <color>, I would argue that it should rather be associated with <system-color>. In other words, ideally, I would expect:

<color> = <color-base> | currentColor | <system-color>
<system-color> = AccentColor | AccentColorText | [...] | <deprecated-color>
<deprecated-color> = ActiveBorder | ActiveCaption | [...]

If there aren't too many of these situations, and if they touch types that should be fairly stable, patching the data is fine. The spec itself could perhaps be updated to define these production rules, I do not know to what extent spec editors are amenable to using more formal syntax here.

I note that the values that <system-color> and <deprecated-color> can take are easy to extract (the spec correctly flags them as values for the underlying type). That's why you see them under the values array of the corresponding type in the CSS extract. We could perhaps create syntaxes automatically for types that only take a series of keywords. I'll need to investigate, this could perhaps also create incomplete syntax for other types, e.g., if a spec only defines a partial list of values for some reason. In general, whenever we try to "create a syntax", we realize afterwards that we got something wrong...

Back to the consolidated CSS that will replace the individual CSS extracts in the npm package, by definition, it can only contain one syntax definition for <color>, and there are two: the one in css-color-4 and the one in css-color-5. As above, we'd like to restrict the amount of manual intervention as much as possible. The rule we adopted is that the consolidated CSS will always contain the syntax of the latest level as a result. As such, the syntax for <color> in the consolidated CSS extract will be the one in css-color-5:

<color> = <color-base> | currentColor | <system-color> | 
      <contrast-color()> | <device-cmyk()>  | <light-dark()>

(That shows another reason to attach <deprecated-color> to <system-color> as done in the spec, it can then survive re-definitions of <color>)

tidoust avatar Jul 26 '25 11:07 tidoust

I would expect:

<color> = <color-base> | currentColor | <system-color>
<system-color> = AccentColor | AccentColorText | [...] | <deprecated-color>
<deprecated-color> = ActiveBorder | ActiveCaption | [...]

Could you please implement this?

asamuzaK avatar Aug 15 '25 20:08 asamuzaK

Could you please implement this?

Tracked in #1647. Patching Webref data should be viewed as a last resort. Someone needs to look into updating the specs and/or improving the extraction logic first.

tidoust avatar Aug 20 '25 13:08 tidoust

First non-alpha release of @webref/css version 7 is now available (done in #1663). First version is actually 7.0.11.

I'm keeping this issue open to collect potential problems that dependent projects may face while transitioning to the new version. We'll continue to update version 6 in the meantime.

tidoust avatar Sep 09 '25 07:09 tidoust