Source reformatting: switch to no line-breaking
Meta: Preparation for source reformatting
- Add data-noreformat attributes
- Fix double spaces and incorrect wrapping around tags
- Add a Check Formatting workflow
- Update CONTRIBUTING.md
Meta: Source reformatting: switch to no line-breaking
This applies reformahtml 0.1.9. See https://github.com/zcorpan/reformahtml
Fixes #1059.
/acknowledgements.html ( diff ) /browsers.html ( diff ) /browsing-the-web.html ( diff ) /canvas.html ( diff ) /common-dom-interfaces.html ( diff ) /common-microsyntaxes.html ( diff ) /comms.html ( diff ) /custom-elements.html ( diff ) /dnd.html ( diff ) /document-lifecycle.html ( diff ) /document-sequences.html ( diff ) /dom.html ( diff ) /dynamic-markup-insertion.html ( diff ) /edits.html ( diff ) /embedded-content-other.html ( diff ) /embedded-content.html ( diff ) /form-control-infrastructure.html ( diff ) /form-elements.html ( diff ) /forms.html ( diff ) /grouping-content.html ( diff ) /iana.html ( diff ) /iframe-embed-object.html ( diff ) /image-maps.html ( diff ) /imagebitmap-and-animations.html ( diff ) /images.html ( diff ) /index.html ( diff ) /indices.html ( diff ) /infrastructure.html ( diff ) /input.html ( diff ) /interaction.html ( diff ) /interactive-elements.html ( diff ) /introduction.html ( diff ) /links.html ( diff ) /media.html ( diff ) /microdata.html ( diff ) /named-characters.html ( diff ) /nav-history-apis.html ( diff ) /obsolete.html ( diff ) /parsing.html ( diff ) /popover.html ( diff ) /references.html ( diff ) /rendering.html ( diff ) /scripting.html ( diff ) /sections.html ( diff ) /semantics-other.html ( diff ) /semantics.html ( diff ) /server-sent-events.html ( diff ) /speculative-loading.html ( diff ) /structured-data.html ( diff ) /syntax.html ( diff ) /system-state.html ( diff ) /tables.html ( diff ) /text-level-semantics.html ( diff ) /timers-and-user-prompts.html ( diff ) /urls-and-fetching.html ( diff ) /web-messaging.html ( diff ) /webappapis.html ( diff ) /webstorage.html ( diff ) /workers.html ( diff ) /worklets.html ( diff ) /xhtml.html ( diff )
The reformatting tool supports opting out for specific elements with a data-noreformat attribute. I haven't added such an attribute yet, but some things should probably have it, e.g. the Dependencies section.
Very cool!
The other major section to not reformat is the acknowledgements section.
Is there any way to do some sort of programmatic check that only whitespace has changed, to rule out any content-deleting bugs in the reformatter? Maybe just the htmldiff tool would work?
In the paragraph that begins "Here is an example of color space conversion", the status quo:
... <code data-x=\n "dom-....
is converted to:
... <code data-x= "dom-...
but that space after the = probably shouldn't be there.
(This is the only case where the status quo has a line-break after an attribute's =.)
Arguably that's a bug in the status quo since you're supposed to only wrap on where whitespace would "naturally" occur.
@jmdyck fixed the = issue, thanks!
@domenic
Very cool!
Thanks!
The other major section to not reformat is the acknowledgements section.
I've marked elements I found with custom indentation with data-noreformat now in a separate commit.
Is there any way to do some sort of programmatic check that only whitespace has changed, to rule out any content-deleting bugs in the reformatter? Maybe just the htmldiff tool would work?
file=source
a=$(git show HEAD:"$file" | tr -d '[:space:]' | git hash-object --stdin)
b=$(tr -d '[:space:]' < "$file" | git hash-object --stdin)
if [ "$a" = "$b" ]; then
echo "Only whitespace differences"
else
echo "Real changes"
fi
Shows "Only whitespaces differences" after running the script.
I think I'm happy with the output now. LMK if there are more elements that should have data-noreformat or if there are bugs in reformahtml.py.
Arguably that's a bug in the status quo since you're supposed to only wrap on where whitespace would "naturally" occur.
Would you prefer a separate PR to fix such bugs?
As far as I can tell we only have one instance. We could fix it separately or incorporate it in a random PR. I don't think it matters much?
No need to fix = since the issue goes away with the reformatting.
As far as I can tell we only have one instance.
There's only one instance of 'newline after equals', yes, but there's more than one instance of 'newline where whitespace wouldn't "naturally" occur'.
In this PR, there are many cases where, if the first non-blank on a line is an element, it isn't joined up with the previous line, but presumably should be. (If you're looking for examples, searching for /^ *<\(var\|code\|ref\|span\)/ in vim will find a lot of them.)
@jmdyck, thanks, fixed now.
@zcorpan it seems this branch still has conflicts?
- A lot of the
<ref>tags are preceded by 2-10 spaces, where it should presumably be only 1. - Sometimes a
<ref>isn't joined to the previous line (e.g. lines 3170, 6087), but probably should be. - Sometimes a
</div>tag is joined to the previous line (e.g. line 1289), but should probably stay on its own line. - The treatment of comments seems inconsistent: sometimes joined up, sometimes not. Is there a rule?
A lot of the
<ref>tags are preceded by 2-10 spaces, where it should presumably be only 1.Sometimes a
<ref>isn't joined to the previous line (e.g. lines 3170, 6087), but probably should be.Sometimes a
</div>tag is joined to the previous line (e.g. line 1289), but should probably stay on its own line.
Thanks, these should be fixed now.
- The treatment of comments seems inconsistent: sometimes joined up, sometimes not. Is there a rule?
Yes, comments that are preceded with a newline and followed by a newline are considered structural and left alone (including internal linebreaks). Other comments are reflowed inline.
This means some comments still have internal linebreaks that could arguably have internal linebreaks removed, but I think we can leave them as-is.
I'm happy with the result now. I've looked through the source and added some more data-noreformat attributes. I've also added a workflow step to run the script so future changes use correct wrapping. I've also updated the documentation in CONTRIBUTING.md.
I noticed multiple spaces before comments.
Edit: fixed.
I'm puzzled about the semantics of data-noreformat.
E.g., in source at line 2449 (under "terms defined in URL"), there's a <ul> marked data-noreformat, so I expected that the whole of that element (down to line 2509) would be identical in the reformatted file. But there are 3 spots within the element where a linebreak was eliminated.
(Mind you, I think that's the only example where data-noreformat seems to have been disrespected.)
There are some spots where the reformatter has incorrectly dropped a space before a start-tag, e.g.
- line 1324:
sets of<span data-x="plugin">plugins - line 6292:
The<dfn id="lazy-load-root-margin">lazy load scroll margin - line 8670:
the document is a<span>cookie-averse
There are some spots where the reformatter leaves an end-tag on a line by itself, where it should probably append it to the previous line, e.g.:
-
</p>on lines 7163, 27872, 37167, 42547, 50845. (In some cases, maybe the original should be markeddata-noreformat.) -
</dd>on lines 30446, 30451, 98337, 107255+following. -
</p></dd>on 52561, 73068. -
</p></li>on 84068.
Yes, comments that are preceded with a newline and followed by a newline are considered structural and left alone (including internal linebreaks). Other comments are reflowed inline.
Thanks.
In the source, there are 4 cases where two comments are immediately adjacent. (Scan for --><!--). In each case, neither comment qualifies as structural (because the first isn't followed by a newline and the second isn't preceded by a newline), and so the reformatter reflows them. However, in the 2nd+3rd cases (in "The DataTransferItem Interface"), it would probably be better if they were left as is. So in those two cases, I suggest inserting a newline between the two comments in your 'preparation' commit.
Sorry to jump in late, but I'm still not sure what the value add here is. Is it just lower friction for contributors to not have to worry about line breaks? What is the rule we're trying to enforce after this PR? Is it:
- you can break wherever you want, and have arbitrarily long paragraphs; or
- no line breaks within a paragraph are allowed
I think I still prefer the 100 character line break rule, and using a tool like specfmt to keep this specification's source more consistent with other WHATWG specs, whereas it seems this change drives more inconsistency across our specs.
The rules are not consistent though. Most other WHATWG specifications don't permit breaking on every possible space. Moving towards wrapping in your editor seems acceptable to me and if this works out we can port it to DOM, Fetch, etc.
Generally, no line breaks within a paragraph are allowed. Exceptions are "custom" linebreaks for readability and easier editing, e.g. HTML uses one line per name in the Acknowledgements paragraph. These must have a data-noreformat attribute.
The benefit of this approach is that contributors don't have to use a tool like specfmt, but instead just write the paragraph and then commit. The current line-wrapping rule for HTML also makes searching for terms annoying, which is fixed by this change.
@jmdyck thank you for reviewing!
There are some spots where the reformatter has incorrectly dropped a space before a start-tag, e.g.
This was a bug in the script, now fixed.
There are some spots where the reformatter leaves an end-tag on a line by itself, where it should probably append it to the previous line, e.g.:
These were due to inconsistent formatting in source, now fixed (in the second commit, to be squashed into the first).
In the source, there are 4 cases where two comments are immediately adjacent. (Scan for
--><!--). In each case, neither comment qualifies as structural (because the first isn't followed by a newline and the second isn't preceded by a newline), and so the reformatter reflows them. However, in the 2nd+3rd cases (in "The DataTransferItem Interface"), it would probably be better if they were left as is. So in those two cases, I suggest inserting a newline between the two comments in your 'preparation' commit.
Fixed.
I'm still puzzled about the semantics of data-noreformat (see https://github.com/whatwg/html/pull/11640#issuecomment-3367726885).
At line 65870 of source, there's:
<p class="note">The requirement that <span data-x="data block">data blocks</span>
<!--non-normative-->must be denoted using a <span>valid MIME type string</span>
In the reformatted result (line 48421), these lines don't get joined up, but they probably should be.
Generally, the reformatter squashes runs of spaces down to a single space, but there are a few places where it doesn't:
- line 10057:
whitespace</span>. <!--<code>Text - line 44229:
<!-- or "cell"? --> - line 62897:
the <dfn
@jmdyck thanks, I think those are also fixed now.
Okay, I think this is ready to go.
The rules are not consistent though. Most other WHATWG specifications don't permit breaking on every possible space. Moving towards wrapping in your editor seems acceptable to me and if this works out we can port it to DOM, Fetch, etc.
What are some good, recommended editor settings to visually wrap existing long lines to ~100 characters, and preserve the indentation level? Specifically for vim, I can't find a good way to visually wrap long lines tighter than the vim window itself, which leaves the spec pretty unreadable to me. Does anyone know the trick in vim?
Also, is there a way to preserve indentation among other things in GitHub's review panel? Or is the plan to just live with wonky indents like:
I worry about this degrading readability during reviews.
Also, I worry this would make PRs harder to review, since you can't anchor comments in the middle of a visually wrapped line on GitHub. It will be hard to reference specific text in the middle of a paragraph, since your comment will just be anchored at the first visual line of the whole block, which is really weird. Suggestions won't be harder, since you an make modifications in the middle of a block of text (albeit the block will be larger now), but I think the comment experience would be pretty annoying when trying to target very specific text.