infra icon indicating copy to clipboard operation
infra copied to clipboard

Should vertical tab character be included in ASCII whitespace?

Open josepharhar opened this issue 8 months ago • 11 comments

What is the issue with the Infra Standard?

This definition of ASCII whitespace does not include U+000B VT: https://infra.spec.whatwg.org/#ascii-whitespace

I looked on wikipedia which links to what looks like the unicode standard which calls VT whitespace.

I noticed this while implementing https://github.com/whatwg/dom/pull/1079 because VT is also considered whitespace in an old helper method in chromium.

josepharhar avatar May 14 '25 23:05 josepharhar

I think most parts of the web platform do not consider VT whitespace. I don't recall where this was discussed though; maybe @annevk does.

We should definitely document this either way, similar to how we document U+000C FF. (https://github.com/whatwg/infra/pull/649)

I'm curious which parts of Chromium consider VT whitespace and which parts do not. Do you know? Based on the code search, it looks like some parts of Chromium do consider it, including parts which per spec should not.

I wonder if we have web platform tests for this... eventually we should, for all the places "ASCII whitespace" is used on the platform. (I guess for FF as well.)

But, I also don't want to make anyone shave this yak in a way that blocks https://github.com/whatwg/dom/pull/1079 :)

domenic avatar May 15 '25 01:05 domenic

I think most parts of the web platform do not consider VT whitespace.

Unless for some reason we’re explicitly not considering JavaScript implementations, it seems worth noting here that the “White Space Code Points” definition in the ES spec includes U+000B VT.

Based on the code search, it looks like some parts of Chromium do consider it, including parts which per spec should not.

I know that’s definitely the case for WebKit as well.

As far as the general pattern, I think that unless some feature has a spec which explicitly requires ASCII whitespace, the implementations use a looser definition of whitespace that includes VT.

But as far as the parts-which-per-spec-should-not cases, I’ve run across plenty of those in implementations. I think in some (or many) of those cases, it may be because there aren’t actually any WPT tests for checking it.

https://github.com/WebKit/WebKit/pull/24217 is one sorta related example. In that case, it’s new code. But the reason it hasn’t landed is because there are no existing WPT tests for it, and I never got around to making time to write the tests myself.

But I also vaguely recall that some (or many) parts of CSS code also use a broader definition of whitespace — rather than the CSS post-preprocessing whitespace thing (which is functionally equivalent to ASCII whitespace) thing that they should be using per-spec. But I could be misremembering.

sideshowbarker avatar May 15 '25 02:05 sideshowbarker

I fixed a bunch of this in WebKit some years ago and added better helpers. There's still some things to be fixed, but I'm not in favor of changing this definition at this point. If anything it's already too wide.

JavaScript is not a good place to borrow from as for some inexplicable reason they use Unicode's definition of White Space, which changes over time. That's absolutely not what we'd want.

annevk avatar May 15 '25 07:05 annevk

U+000B was removed from the list of "space characters" in HTML in https://github.com/whatwg/html/commit/63e2aeb0b399b4740460388264a1b523ac6ac752

I didn't find a relevant email or bug or IRC discussion from June 2008, though.

zcorpan avatar May 15 '25 07:05 zcorpan

I suspect this can't be changed at least for the HTML parser without causing XSS issues.

https://software.hixie.ch/utilities/js/live-dom-viewer/saved/13786

zcorpan avatar May 15 '25 07:05 zcorpan

The biggest things still outstanding from the WebKit audit I did are https://github.com/w3c/csswg-drafts/issues/8757 and improving CSP test coverage as Mike mentioned above. (WebKit still does CSP incorrectly, but it's an easy fix.)

annevk avatar May 15 '25 07:05 annevk

Thanks! It sounds like VT should not be included in whitespace. I'll incorporate this into the WPTs and implementation I write for https://github.com/whatwg/dom/pull/1079

josepharhar avatar May 15 '25 14:05 josepharhar

I added the tracker to make sure we capture decisions/information like this for posterity (rather than depending on memory). Note that this is covered by I18N's best practices doc here...

aphillips avatar May 16 '25 20:05 aphillips

I18N discussed this in our 2025-05-22 call and I was actioned with adding our thoughts to this issue. Basically, we agree that, while VT is a "sort of whitespace" character, it would be disruptive and a Bad Idea to introduce it as such in HTML's syntax. We may proceed to add HTML to our table of whitespace flavors in our best practices document to help cement the idea that nothing is wrong here.

aphillips avatar Jul 11 '25 19:07 aphillips

Sounds good, closing this as not planned accordingly.

annevk avatar Jul 12 '25 06:07 annevk

Reopening since we should add a note to Infra similar to #649.

domenic avatar Jul 12 '25 06:07 domenic