specref icon indicating copy to clipboard operation
specref copied to clipboard

Improve Unicode script (#881)

Open socram8888 opened this issue 3 months ago • 6 comments

Overhauled the script to extract all available revisions for each of the standards, so it is possible to link to a specific one.

Now also the main URL for all Unicode standards now point to the latest live on their website.

socram8888 avatar Sep 23 '25 18:09 socram8888

The update drops rawDate at the root level. Now, I realize that SpecRef is somewhat inconsistent there: the date is always set for W3C entries (to the date of the latest version), sometimes for entries with versions in biblio.json, and never for WHATWG entries. Given that Unicode specs are not updated on a continuous basis, I would report the last date to the root level as well, so that specs that care about dates can display the date when they reference the spec.

tidoust avatar Sep 24 '25 10:09 tidoust

I can add that no problem, but I'm not sure if it's a good idea in general for versioned entries when not referencing any version in particular?

If they want to explicitely state the last version they checked for compatibility alongside its date, they can now reference a particular version.

For a non-specific version, however, the date would cause the documents referring to it to also change the date any time they're recompiled, even if the writer has not actually checked the newer version to be fully compatible with the documentation.

For example, UTS46-33 made some changes in the processing that were not covered in the WHATWG URL specs at the time, and needed some changes (https://github.com/whatwg/url/issues/836). With the date there, any recompilations of the WHATWG URL document between the new UTS46-33 and ammending of the WHATWG URL standard, would cause the date to be also updated, incorrectly implying UTS46-33 changes were already taken into account.

IMO if they want to specify a non-specific version with a check date, that should be manually stated by the writer, as the compilation time will be later than the time they've checked it, and the refDate at the root level could be different.

socram8888 avatar Sep 24 '25 10:09 socram8888

I think we should leave the date for the reason @tidoust mentioned here:

I guess the argument goes both ways. That is, without any mention of date, you also imply that the latest version you're going to get when you retrieve the URL was the one taken into account. That's what you get when you choose to reference "the latest version of a spec". With a date, you could at least theoretically speaking spot the fact that the document you're referencing has changed when you re-build your spec.

tobie avatar Sep 25 '25 21:09 tobie

Would it make sense to do that outside the extraction script, though? Sort of max([c.refdate for c in versions]) so it's consistent for other sources, as currently that is not the case as @tidoust said for W3G standards.

socram8888 avatar Sep 26 '25 08:09 socram8888

Possibly. Would argue doing this in a separate PR, though

tobie avatar Sep 26 '25 08:09 tobie

I guess that could be done in https://github.com/tobie/specref/blob/main/lib/bibref.js#L263-L270, eg if parent.rawDate is null check if latest isn't and copy it.

socram8888 avatar Sep 26 '25 09:09 socram8888