proposal-regexp-features icon indicating copy to clipboard operation
proposal-regexp-features copied to clipboard

Feature request: `\h` to match all "horizontal whitespace" characters

Open gfscott opened this issue 4 years ago • 1 comments

In several RegExp engines, \h works as a convenience method for specifying "horizontal whitespace". For the majority of cases you can capture these type of characters with only the space and tab characters ([\t ]) but that omits edge cases related to less commonly used, non-newline whitespace characters like en space (U+2002), em space (U+2003), and thin space (U+2009).

Essentially, \h gives you a subset of \s that omits the newline characters. Having access to this flag in ECMAScript would help me write regular expressions with greater confidence that I'll capture strings even if they contain rarely used whitespace characters:

Screen Shot 2022-04-02 at 4 10 18 PM

\h Language support

Not exhaustive, but this table is based on the engines available in regex101:

Language/Engine Supported? Behavior Example
PCRE Success Link
PCRE 2 Success Link
Java Success Link
Golang 🚫 Invalid token error Link
.NET 🚫 Invalid token error Link
Python 🚫 Matches literal h character Link
ECMAScript 🚫 Matches literal h character Link

gfscott avatar Apr 02 '22 20:04 gfscott

\h means "hexadecimal digit" ([0-9A-Fa-f], \p{AHex}) in some other regex flavors like Oniguruma and Onigmo.

slevithan avatar Jan 24 '25 03:01 slevithan