webref Extract CDDL for WebDriver BiDi

Currently we have multiple specs defining WebDriver BiDi endpoints; https://w3c.github.io/webdriver-bidi/ itself does, but also https://w3c.github.io/permissions/#automation-webdriver-bidi and https://webbluetoothcg.github.io/web-bluetooth/#automated-testing.

We have some code to extract CDDL from the WebDriver BiDi code, see https://github.com/w3c/webdriver-bidi/blob/3b1939c2380b9ba12cc123cef03c1e3af15fa01d/scripts/cddl/utils.js#L13, and generates three CDDL files ultimately: local.cddl, remote.cddl, all.cddl.

We probably want to pull out all of local-cddl and remote-cddl separately; all is just the union of both of these.

See also: https://github.com/w3c/webdriver-bidi/issues/791

Sep 26 '24 19:09 gsnedders

I'm not sure how to approach the "multiple extracts per spec" need (local and remote for WebDriver Bidi). The crawler generates one extract per spec. At a minimum, we'd need some sort of generic convention to understand what bucket to use for a given definition.

I note that the Open Screen Protocol also defines CDDL. In the case of the Open Screen Protocol, CDDL is actually maintained in a .cddl file that gets converted to an HTML file through a script upon build and imported in the spec (interestingly, the resulting spec does not contain any semantic info that says "this is CDDL").

If we start extracting CDDL from specs, it would be useful to also think about some validation mechanism.

Sep 26 '24 21:09 tidoust

(I also guess I maybe should've filed this against reffy rather than webref)

Sep 26 '24 21:09 gsnedders

Another spec that defines CDDL is WebAuthn. There, the CDDL is not flagged in any particulary way (just a <pre>). I don't see other specs in browser-specs that reference RFC8610.

Sep 26 '24 21:09 tidoust

I wonder if we need to add an extra class for WebDriver BiDi CDDL?

Sep 26 '24 22:09 gsnedders

That does not change the problem at hand, but just to draw a more complete picture, in an ideal world, authoring tools would also support something like a <pre class=cddl> section that would:

Turn on syntax highlighting (CDDL syntax highlighting is already supported in Bikeshed provided one adds highlight="cddl", but does not seem supported by highlight.js that ReSpec uses)
Create definitions and auto-link terms, as done in Web IDL blocks, noting that there is no specific definition type that may be used for CDDL terms right now.
Allow for the automatic creation of an index at the end of the specification.

For Bikeshed, that's tracked in https://github.com/speced/bikeshed/issues/2072. The issue notes the need to split the index into two (and the need for Webref to extract the CDDL).

Oct 04 '24 06:10 tidoust

Getting back to this...

@gsnedders What is the use case for the all.cddl file?

It makes sense to me for the crawler to produce local.cddl and remote.cddl extracts. I'm wondering about the rationale for producing the all.cddl extract. Is it useful for implementations, testing purpose, or something else?

Dec 03 '24 10:12 tidoust

@tidoust That's a good question; I listed it because it already existed, seemingly from https://github.com/w3c/webdriver-bidi/pull/465.

@OrKoN @jgraham @whimboo do you know?

Dec 03 '24 16:12 gsnedders

we do use all.cddl to generate TypeScript types and validation code in chromium-bidi. I believe all.cddl helps to avoid duplication of definitions which are marked as both local and remote.

Dec 03 '24 16:12 OrKoN

OK, thanks. If that file is useful, I'll try to re-create it in Webref, then.

While you're all watching, feel free to have a look at https://github.com/speced/bikeshed/pull/2977, a proposal to add CDDL support to Bikeshed.

Support would add autolinking capabilities within CDDL blocks and generate a CDDL index at the end of the spec. Currently, this would generate a CDDL index per module (building on a convention similar to the one used in WebDriver BiDi). In the case of WebDriver BiDi, the CDDL index would thus list two modules, one with the contents of local.cddl and one with the contents of remote.cddl. If you feel that the spec itself should also contain a CDDL index with all definitions, let me know! I'm thinking it could be generated by the crawler otherwise.

Alternatively, it could also contain 3 indexes:

One with all CDDL definitions that are common to all modules
One with all local end definitions
One with all remote end definitions

I'm not sure how to generalize this approach to a spec that defines more than two CDDL modules though (combinatorial explosion), but perhaps it just doesn't make sense for a spec to define more than 2 CDDL modules at once.

Dec 03 '24 17:12 tidoust

I've a question regarding CI checks when certain specifications are changing their CDDL definitions. Is there a way to actually cross-check if we can still generate the files or if failures occur? It's basically out of our hand what other specifications are doing, and we not necessarily want to see our final CDDL export broken.

Dec 04 '24 16:12 whimboo

There are two data levels in Webref:

the main branch contains raw data extracted from specs. That comes with no guarantee whatsoever. If a spec that defines CDDL contains invalid CDDL, that invalid CDDL will be in the main branch.
the curated branch contains a curated version of the raw data. That branch is only updated when guarantees are met. When guarantees are broken, we patch the data manually to resume updates. Now, we'll need to agree on guarantees for CDDL. I don't really know what "generate the files" means. I suppose that we'll want to run a CDDL validator (cddl-rs?) against the extracts and guarantee that all CDDL extracts pass validation.

Dec 04 '24 19:12 tidoust

Oh, and there's a third level I completely forgot to mention:

Updates are manually reviewed on a weekly basis (on average) to release npm packages: @webref/css, @webref/idl, @webref/events. The manual review allows us to catch additional consistency issues that cannot easily be checked automatically. We could envision a @webref/cddl package if that seems useful.

Dec 05 '24 07:12 tidoust

* the `curated` branch contains a curated version of the raw data. That branch is only updated when guarantees are met. When guarantees are broken, we patch the data manually to resume updates. Now, we'll need to agree on guarantees for CDDL. I don't really know what "generate the files" means. I suppose that we'll want to run a CDDL validator ([cddl-rs](https://github.com/anweiss/cddl)?) against the extracts and guarantee that all CDDL extracts pass validation.

Yes, that’s exactly what I was referring to. We already perform such validation in our GitHub Actions for WebDriver BiDi to ensure that a PR doesn’t break the exported CDDL. However, I’m still curious — if someone were to make a PR in, for example, the Permissions API that breaks the CDDL due to a conflict with the CDDL definitions from the WebAuthn spec, is there a way for WebRef to generate the final CDDL and flag a failure if it encounters an issue? Maybe each of the specs that export a CDDL for BiDi would need an external trigger for WebRef?

* Updates are manually reviewed on a weekly basis (on average) to release npm packages: [`@webref/css`](https://www.npmjs.com/package/@webref/css), [`@webref/idl`](https://www.npmjs.com/package/@webref/idl), [`@webref/events`](https://www.npmjs.com/package/@webref/events). The manual review allows us to catch additional consistency issues that cannot easily be checked automatically. We could envision a `@webref/cddl` package if that seems useful.

I assume that would be fine, as it’s highly unlikely that a new API could be implemented and shipped in any browser within a week. Having a guaranteed error-free CDDL that WebDriver BiDi clients can use for code generation would be very helpful. I assume such a cddl package could also support including different CDDL definitions, allowing other specs not tied to WebDriver to use it as well.

Dec 05 '24 08:12 whimboo

The current plan to only add the local and remote indexes sounds good to me. One concern will be that there are currently 38 types that are both local and remote so those 38 types would be duplicated in both lists? It might be more useful to generate a single index based on all.cddl but have some way to annotate for each type if it is local, remote or both.

Dec 06 '24 15:12 OrKoN

However, I’m still curious — if someone were to make a PR in, for example, the Permissions API that breaks the CDDL due to a conflict with the CDDL definitions from the WebAuthn spec, is there a way for WebRef to generate the final CDDL and flag a failure if it encounters an issue? Maybe each of the specs that export a CDDL for BiDi would need an external trigger for WebRef?

I wasn't necessarily planning to add logic to Webref to combine and check CDDL exports across specs initially.

As opposed to IDL where all specs share one single namespace, CDDL defined in different specs typically lie in different namespaces. To add checks in Webref, we would need to know which CDDL extracts to combine. I'd prefer that to be defined in the specs themselves (we try to minimize the number of things that we have to maintain manually in Webref, data patching and manual reviews already take time).

There is ongoing work on import statements for CDDL. But the import statements would need to be in the WebDriver BiDi spec itself, which seems to be going in the wrong direction for an extension mechanism. Ideally, we'd need a spec convention that the Permissions API could use to say "the CDDL defined here extends the one defined in WebDriver BiDi" and, more importantly, that we could detect and leverage when we crawl the spec.

Dec 06 '24 16:12 tidoust

The current plan to only add the local and remote indexes sounds good to me. Once concern will be that there are currently 38 types that are both local and remote so those 38 types would be duplicated in both lists?

Yes.

It might be more useful to generate a single index based on all.cddl but have some way to annotate for each type if it is local, remote or both.

Maybe :) I guess I don't really know how the indexes in the spec are going to be used. It seemed more useful to me at first glance to separate the CDDL modules completely, instead of having a set of interleaved CDDL rules that apply to either or both of them with repeated comments such as # Defined for: remote end at the start of rule definitions. But then the CDDL in WebDriver BiDi is quite long, so it may be better to avoid duplication. Could you suggest an annotation mechanism?

Dec 06 '24 16:12 tidoust

Could you suggest an annotation mechanism?

Not really, therefore, I think the current proposal is good.

Dec 06 '24 18:12 OrKoN

@OrKoN Back to the all.cddl file. I understand that's not needed to generate TypeScript types, but something that still bugs me is that the file in itself is not a completely usable CDDL document.

As specified in RFC8610, a CDDL document defines one type ("the one defined by its first rule"). For all.cddl, first rule is Command, so the document essentially defines the CDDL of the remote end. The Message rule does appear in the document, but it is not attached to the first rule and, as such, remains hidden.

As a result, if I give all.cddl to a CDDL validator, it can only be used to validate Command CBOR data items. Of course, it's likely that I'll be able to give the validator the name of the rule to start from and validate Message CBOR data items as well, but having to specify the initial rule is not standard behavior.

If we publish a CDDL document that contains both remote end and local end definitions, I would expect it to be directly useful to validate CBOR data items that match either set of definitions. In short, I would expect all.cddl to start with something like:

Root = Command / Message

The crawler could perhaps generate that rule itself from the first rule names it encounters. The crawler would just need to mint a name for the first rule (Root or something more guaranteed not to clash with another rule name).

Alternatively, it could be the spec's reponsibility to define the root rule. This could perhaps be done in the spec as:

<pre class="cddl remote-cddl">
Root /= Command
Command = ...
</pre>

<pre class="cddl local-cddl">
Root /= Message
Message = ...
</pre>

Does this make sense? Any preferred approach?

Dec 09 '24 10:12 tidoust

However, I’m still curious — if someone were to make a PR in, for example, the Permissions API that breaks the CDDL due to a conflict with the CDDL definitions from the WebAuthn spec, is there a way for WebRef to generate the final CDDL and flag a failure if it encounters an issue? Maybe each of the specs that export a CDDL for BiDi would need an external trigger for WebRef?

I don't think we need validation similar to what we have for WebIDL initially, just being able to find all the CDDL is itself already a win.

/test/idl/consistency.js checks consistency for all IDL stuff, but as mentioned this is slightly harder in the CDDL case (there's specs with CDDL which aren't WebDriver BiDi messages, whereas I believe every spec with IDL is IDL for browser interfaces).

Certainly longer term requiring what's in the curated branch to pass similar tests wouldn't be unreasonable, though it will likely require patches similar to what we have for IDL today to deal with cases where specs add contradictory definitions — because that's a problem we have with IDL too.

Dec 09 '24 18:12 gsnedders

@tidoust with our tooling we validate per command/event so we do not use the entire CDDL document for validation. I think if we want to support this use case the specs should define the root (the first type in the category). In case of all.cddl though defining Root = Command / Message does not seem to be useful for validation since you probably wants to validate incoming separately from outgoing messages.

Dec 11 '24 15:12 OrKoN

That seems fine, @OrKoN. From a naming perspective, the current plan was for the crawler to create webdriver-bidi-local.cddl, webdriver-bidi-remote.cddl, and webdriver-bidi.cddl for the entire version. If that version is not going to be useful for validation, I think I would prefer to use webdriver-bidi-all.cddl as a name, so that it's somewhat clearer that this is not the default extract.

Another consequence is more related to spec authoring. Right now, the plan is to consider that <pre class="cddl"> tag that does not specify any module is defined for all modules. This was meant to ease authoring of CDDL common to all modules (and also because, in Bikeshed, I don't really know how to make such CDDL appear in the generated CDDL index). But if we want to allow specs to define something such as Root = Command / Message, such CDDL should probably only go to the entire CDDL extract. I'll change that behavior in Reffy's PR and in the proposed Bikeshed PR.

Dec 11 '24 16:12 tidoust

@gsnedders said:

I don't think we need validation similar to what we have for WebIDL initially, just being able to find all the CDDL is itself already a win.

+1 to the multi-step plan :) I'll finalize pending PRs, but we should be able to start adding CDDL extracts to Webref soon (with no validation) .

Dec 11 '24 16:12 tidoust

Webref now has CDDL extracts of main specs that define CDDL: AT Driver, Permissions, Web Bluetooth, WebDriver BiDi. To be joined later on by a few other specs once the specs produce suitable <pre class="cddl"> blocks (also when Bikeshed gets support for CDDL).

The CDDL extracts are per module, with an -all extract that contains all CDDL definitions (without duplicates, or at least without duplicates unless the duplicates are in the spec itself).

The CDDL extracts come with no guarantee for now. Most are actually slightly invalid today because the specs currently contain <pre class="cddl"> blocks that reference CDDL constructs instead of defining them (see https://github.com/w3c/webdriver-bidi/pull/831 for WebDriver BiDi).

I'm closing this issue as addressed. I created #1417 to track CDDL validation in Webref so that we may make guarantees in the future. We may also consider creating an @webref/cddl npm package once guarantees are in place if that seems useful.

Dec 17 '24 20:12 tidoust