html icon indicating copy to clipboard operation
html copied to clipboard

Source reformatting: switch to no line-breaking

Open zcorpan opened this issue 6 months ago • 37 comments

Meta: Preparation for source reformatting

  • Add data-noreformat attributes
  • Fix double spaces and incorrect wrapping around tags
  • Add a Check Formatting workflow
  • Update CONTRIBUTING.md

Meta: Source reformatting: switch to no line-breaking

This applies reformahtml 0.1.9. See https://github.com/zcorpan/reformahtml

Fixes #1059.


/acknowledgements.html ( diff ) /browsers.html ( diff ) /browsing-the-web.html ( diff ) /canvas.html ( diff ) /common-dom-interfaces.html ( diff ) /common-microsyntaxes.html ( diff ) /comms.html ( diff ) /custom-elements.html ( diff ) /dnd.html ( diff ) /document-lifecycle.html ( diff ) /document-sequences.html ( diff ) /dom.html ( diff ) /dynamic-markup-insertion.html ( diff ) /edits.html ( diff ) /embedded-content-other.html ( diff ) /embedded-content.html ( diff ) /form-control-infrastructure.html ( diff ) /form-elements.html ( diff ) /forms.html ( diff ) /grouping-content.html ( diff ) /iana.html ( diff ) /iframe-embed-object.html ( diff ) /image-maps.html ( diff ) /imagebitmap-and-animations.html ( diff ) /images.html ( diff ) /index.html ( diff ) /indices.html ( diff ) /infrastructure.html ( diff ) /input.html ( diff ) /interaction.html ( diff ) /interactive-elements.html ( diff ) /introduction.html ( diff ) /links.html ( diff ) /media.html ( diff ) /microdata.html ( diff ) /named-characters.html ( diff ) /nav-history-apis.html ( diff ) /obsolete.html ( diff ) /parsing.html ( diff ) /popover.html ( diff ) /references.html ( diff ) /rendering.html ( diff ) /scripting.html ( diff ) /sections.html ( diff ) /semantics-other.html ( diff ) /semantics.html ( diff ) /server-sent-events.html ( diff ) /speculative-loading.html ( diff ) /structured-data.html ( diff ) /syntax.html ( diff ) /system-state.html ( diff ) /tables.html ( diff ) /text-level-semantics.html ( diff ) /timers-and-user-prompts.html ( diff ) /urls-and-fetching.html ( diff ) /web-messaging.html ( diff ) /webappapis.html ( diff ) /webstorage.html ( diff ) /workers.html ( diff ) /worklets.html ( diff ) /xhtml.html ( diff )

zcorpan avatar Sep 08 '25 20:09 zcorpan

The reformatting tool supports opting out for specific elements with a data-noreformat attribute. I haven't added such an attribute yet, but some things should probably have it, e.g. the Dependencies section.

zcorpan avatar Sep 08 '25 20:09 zcorpan

Very cool!

The other major section to not reformat is the acknowledgements section.

Is there any way to do some sort of programmatic check that only whitespace has changed, to rule out any content-deleting bugs in the reformatter? Maybe just the htmldiff tool would work?

domenic avatar Sep 09 '25 00:09 domenic

In the paragraph that begins "Here is an example of color space conversion", the status quo: ... <code data-x=\n "dom-.... is converted to: ... <code data-x= "dom-... but that space after the = probably shouldn't be there.

(This is the only case where the status quo has a line-break after an attribute's =.)

jmdyck avatar Sep 09 '25 03:09 jmdyck

Arguably that's a bug in the status quo since you're supposed to only wrap on where whitespace would "naturally" occur.

annevk avatar Sep 09 '25 06:09 annevk

@jmdyck fixed the = issue, thanks!

@domenic

Very cool!

Thanks!

The other major section to not reformat is the acknowledgements section.

I've marked elements I found with custom indentation with data-noreformat now in a separate commit.

Is there any way to do some sort of programmatic check that only whitespace has changed, to rule out any content-deleting bugs in the reformatter? Maybe just the htmldiff tool would work?

file=source                
a=$(git show HEAD:"$file" | tr -d '[:space:]' | git hash-object --stdin)
b=$(tr -d '[:space:]' < "$file" | git hash-object --stdin)

if [ "$a" = "$b" ]; then
  echo "Only whitespace differences"
else
  echo "Real changes"
fi

Shows "Only whitespaces differences" after running the script.

zcorpan avatar Sep 09 '25 14:09 zcorpan

I think I'm happy with the output now. LMK if there are more elements that should have data-noreformat or if there are bugs in reformahtml.py.

zcorpan avatar Sep 09 '25 15:09 zcorpan

Arguably that's a bug in the status quo since you're supposed to only wrap on where whitespace would "naturally" occur.

Would you prefer a separate PR to fix such bugs?

jmdyck avatar Sep 09 '25 16:09 jmdyck

As far as I can tell we only have one instance. We could fix it separately or incorporate it in a random PR. I don't think it matters much?

annevk avatar Sep 10 '25 10:09 annevk

No need to fix = since the issue goes away with the reformatting.

zcorpan avatar Sep 10 '25 13:09 zcorpan

As far as I can tell we only have one instance.

There's only one instance of 'newline after equals', yes, but there's more than one instance of 'newline where whitespace wouldn't "naturally" occur'.

jmdyck avatar Sep 10 '25 13:09 jmdyck

In this PR, there are many cases where, if the first non-blank on a line is an element, it isn't joined up with the previous line, but presumably should be. (If you're looking for examples, searching for /^ *<\(var\|code\|ref\|span\)/ in vim will find a lot of them.)

jmdyck avatar Sep 10 '25 17:09 jmdyck

@jmdyck, thanks, fixed now.

zcorpan avatar Sep 19 '25 14:09 zcorpan

@zcorpan it seems this branch still has conflicts?

annevk avatar Sep 30 '25 06:09 annevk

  • A lot of the <ref> tags are preceded by 2-10 spaces, where it should presumably be only 1.
  • Sometimes a <ref> isn't joined to the previous line (e.g. lines 3170, 6087), but probably should be.
  • Sometimes a </div> tag is joined to the previous line (e.g. line 1289), but should probably stay on its own line.
  • The treatment of comments seems inconsistent: sometimes joined up, sometimes not. Is there a rule?

jmdyck avatar Sep 30 '25 18:09 jmdyck

  • A lot of the <ref> tags are preceded by 2-10 spaces, where it should presumably be only 1.

  • Sometimes a <ref> isn't joined to the previous line (e.g. lines 3170, 6087), but probably should be.

  • Sometimes a </div> tag is joined to the previous line (e.g. line 1289), but should probably stay on its own line.

Thanks, these should be fixed now.

  • The treatment of comments seems inconsistent: sometimes joined up, sometimes not. Is there a rule?

Yes, comments that are preceded with a newline and followed by a newline are considered structural and left alone (including internal linebreaks). Other comments are reflowed inline.

This means some comments still have internal linebreaks that could arguably have internal linebreaks removed, but I think we can leave them as-is.

zcorpan avatar Oct 02 '25 16:10 zcorpan

I'm happy with the result now. I've looked through the source and added some more data-noreformat attributes. I've also added a workflow step to run the script so future changes use correct wrapping. I've also updated the documentation in CONTRIBUTING.md.

zcorpan avatar Oct 03 '25 11:10 zcorpan

I noticed multiple spaces before comments.

Edit: fixed.

zcorpan avatar Oct 03 '25 12:10 zcorpan

I'm puzzled about the semantics of data-noreformat.

E.g., in source at line 2449 (under "terms defined in URL"), there's a <ul> marked data-noreformat, so I expected that the whole of that element (down to line 2509) would be identical in the reformatted file. But there are 3 spots within the element where a linebreak was eliminated.

(Mind you, I think that's the only example where data-noreformat seems to have been disrespected.)

jmdyck avatar Oct 04 '25 01:10 jmdyck

There are some spots where the reformatter has incorrectly dropped a space before a start-tag, e.g.

  • line 1324: sets of<span data-x="plugin">plugins
  • line 6292: The<dfn id="lazy-load-root-margin">lazy load scroll margin
  • line 8670: the document is a<span>cookie-averse

jmdyck avatar Oct 04 '25 04:10 jmdyck

There are some spots where the reformatter leaves an end-tag on a line by itself, where it should probably append it to the previous line, e.g.:

  • </p> on lines 7163, 27872, 37167, 42547, 50845. (In some cases, maybe the original should be marked data-noreformat.)

  • </dd> on lines 30446, 30451, 98337, 107255+following.

  • </p></dd> on 52561, 73068.

  • </p></li> on 84068.

jmdyck avatar Oct 04 '25 12:10 jmdyck

Yes, comments that are preceded with a newline and followed by a newline are considered structural and left alone (including internal linebreaks). Other comments are reflowed inline.

Thanks.

In the source, there are 4 cases where two comments are immediately adjacent. (Scan for --><!--). In each case, neither comment qualifies as structural (because the first isn't followed by a newline and the second isn't preceded by a newline), and so the reformatter reflows them. However, in the 2nd+3rd cases (in "The DataTransferItem Interface"), it would probably be better if they were left as is. So in those two cases, I suggest inserting a newline between the two comments in your 'preparation' commit.

jmdyck avatar Oct 04 '25 12:10 jmdyck

Sorry to jump in late, but I'm still not sure what the value add here is. Is it just lower friction for contributors to not have to worry about line breaks? What is the rule we're trying to enforce after this PR? Is it:

  • you can break wherever you want, and have arbitrarily long paragraphs; or
  • no line breaks within a paragraph are allowed

I think I still prefer the 100 character line break rule, and using a tool like specfmt to keep this specification's source more consistent with other WHATWG specs, whereas it seems this change drives more inconsistency across our specs.

domfarolino avatar Oct 06 '25 18:10 domfarolino

The rules are not consistent though. Most other WHATWG specifications don't permit breaking on every possible space. Moving towards wrapping in your editor seems acceptable to me and if this works out we can port it to DOM, Fetch, etc.

annevk avatar Oct 07 '25 06:10 annevk

Generally, no line breaks within a paragraph are allowed. Exceptions are "custom" linebreaks for readability and easier editing, e.g. HTML uses one line per name in the Acknowledgements paragraph. These must have a data-noreformat attribute.

The benefit of this approach is that contributors don't have to use a tool like specfmt, but instead just write the paragraph and then commit. The current line-wrapping rule for HTML also makes searching for terms annoying, which is fixed by this change.

zcorpan avatar Oct 07 '25 11:10 zcorpan

@jmdyck thank you for reviewing!

There are some spots where the reformatter has incorrectly dropped a space before a start-tag, e.g.

This was a bug in the script, now fixed.

There are some spots where the reformatter leaves an end-tag on a line by itself, where it should probably append it to the previous line, e.g.:

These were due to inconsistent formatting in source, now fixed (in the second commit, to be squashed into the first).

In the source, there are 4 cases where two comments are immediately adjacent. (Scan for --><!--). In each case, neither comment qualifies as structural (because the first isn't followed by a newline and the second isn't preceded by a newline), and so the reformatter reflows them. However, in the 2nd+3rd cases (in "The DataTransferItem Interface"), it would probably be better if they were left as is. So in those two cases, I suggest inserting a newline between the two comments in your 'preparation' commit.

Fixed.

zcorpan avatar Oct 08 '25 08:10 zcorpan

I'm still puzzled about the semantics of data-noreformat (see https://github.com/whatwg/html/pull/11640#issuecomment-3367726885).


At line 65870 of source, there's:

<p class="note">The requirement that <span data-x="data block">data blocks</span>
<!--non-normative-->must be denoted using a <span>valid MIME type string</span>

In the reformatted result (line 48421), these lines don't get joined up, but they probably should be.


Generally, the reformatter squashes runs of spaces down to a single space, but there are a few places where it doesn't:

  • line 10057: whitespace</span>. <!--<code>Text
  • line 44229: <!-- or "cell"? -->
  • line 62897: the <dfn

jmdyck avatar Oct 10 '25 03:10 jmdyck

@jmdyck thanks, I think those are also fixed now.

zcorpan avatar Oct 14 '25 13:10 zcorpan

Okay, I think this is ready to go.

jmdyck avatar Oct 14 '25 16:10 jmdyck

The rules are not consistent though. Most other WHATWG specifications don't permit breaking on every possible space. Moving towards wrapping in your editor seems acceptable to me and if this works out we can port it to DOM, Fetch, etc.

What are some good, recommended editor settings to visually wrap existing long lines to ~100 characters, and preserve the indentation level? Specifically for vim, I can't find a good way to visually wrap long lines tighter than the vim window itself, which leaves the spec pretty unreadable to me. Does anyone know the trick in vim?

domfarolino avatar Oct 14 '25 22:10 domfarolino

Also, is there a way to preserve indentation among other things in GitHub's review panel? Or is the plan to just live with wonky indents like: Screenshot 2025-10-14 at 6 09 17 PM

I worry about this degrading readability during reviews.

Also, I worry this would make PRs harder to review, since you can't anchor comments in the middle of a visually wrapped line on GitHub. It will be hard to reference specific text in the middle of a paragraph, since your comment will just be anchored at the first visual line of the whole block, which is really weird. Suggestions won't be harder, since you an make modifications in the middle of a block of text (albeit the block will be larger now), but I think the comment experience would be pretty annoying when trying to target very specific text.

domfarolino avatar Oct 14 '25 22:10 domfarolino