JSVerbalExpressions
JSVerbalExpressions copied to clipboard
Functional Rewrite (2.0.0)
See also #196 and #197.
For a sneak peak, check out the tests in the test
directory, especially those in test/examples
.
Completed features
-
anyCharacterFrom
—[…]
-
anyCharacterBut
—[^…]
-
group
—(…)
-
group.capturing
—(…)
-
group.nonCapturing
—(?:…)
-
group.named
|group.capturing.named
—(?<foo>…)
-
-
backReference
—\1
or\k<foo>
-
or
—(?…|…|…)
-
maybe
|optionally
—…?
-
maybe.greedy
—…?
-
maybe.lazy
—…??
-
-
multiple
|zeroOrMore
—…*
-
multiple.greedy
—…*
-
multiple.lazy
—…*?
-
-
oneOrMore
—…+
-
oneOrMore.greedy
—…+
-
oneOrMore.lazy
—…+?
-
-
repeat
-
repeat(x)
—…{x}
-
repeat(min, Infinity)
—…{x,}
-
repeat(min, max)
—…{min,max}
- corresponding greedy alias and lazy variations similar to
oneOrMore
-
-
lookahead
—(?=…)
-
lookahead.negative
—(?!…)
-
lookahead.positive
—(?=…)
-
-
lookbehind
—(?<=…)
-
lookbehind.negative
—(?<!…)
-
lookbehind.positive
—(?<=…)
-
-
anyCharacter
—.
-
digit
—\d
-
nonDigit
—\D
-
whitespaceCharacter
—\s
-
nonWhitespaceCharacter
—\S
-
wordCharacter
—\w
-
nonWordCharacter
—\W
-
something
—.+
-
anything
—.*
-
startOfLine
—^
-
endOfLine
—$
-
wordBoundary
—\b
-
nonWordBoundary
—\B
-
concat
- Flags
Incomplete/planned features
- Unicode code point escapes
- Unicode property escapes
- ~~String replacement helper constants and functions~~ Dropped. See https://github.com/VerbalExpressions/JSVerbalExpressions/pull/198/commits/aeb11c9ff461f5f84ea765b0444e341238035ff5.
- … you tell me :)
Tests
They're all up to date from my end and we have 100% coverage although let me know if you notice something that's not being properly tested. At the moment there are over 15 test suites and over 150 tests.
Docs
I haven't written any docs at the moment, although if you want to get a feel of how things will be, check out the tests in the test
directory, especially test/examples
. I'm thinking of a gatsby site with the actual docs written in mdx although I'm open to other ideas.
This PR should also resolve
- #7
- #30
- #164
- #167
- #186
- #192
- #194
- #202
I've been thinking about re-writing JSVerbalExpressions to use function composition rather than the builder-like pattern it has now.
So now the README.md describes a simple example for using VerbalExpressions as such:
const tester = VerEx() .startOfLine() .then('http') .maybe('s') .then('://') .maybe('www.') .anythingBut(' ') .endOfLine();
This can be described as a builder-like extension for the native
RegExp
object; you can > chain the expression and add more stuff to "build" a complete regular expression.This is very clear approach for building simple, "one-dimensional" regular expressions. The problem with current implementation starts to surface when we start doing more complicated stuff like capture groups, lookaheads/behinds, using "or" pipe etc makes the > expression quickly grow out of maintainability and readability.
For example, I think something like this is impossible to implement with VerbalExpressions at the moment:
/^((?:https?:\/\/)?|(?:ftp:\/\/)|(?:smtp:\/\/))([^ /]+)$/
To make it simpler, I'm proposing a 2.0 rewrite of VerbalExpressions that would take a functional approach, something like:
VerEx( startOfLine, "http", maybe("s"), "://", maybe("www."), anythingBut(" "), endOfLine )
Motivation for this approach would be:
- We can split regular expressions into multiple variables
- Naming "sub-expressions" allows better naming, different abstraction levels in regular expressions
- Each small part is testable with unit tests
- Makes grouping explicit (enforce closing an opened capture group)
So the simplest example could be something like this:
const regex = VerEx( startOfLine, "http", maybe("s"), "://", maybe("www."), anythingBut(" "), endOfLine );
And the complex example could be written e.g. like this:
VerEx( startOfLine, group( or( concat("http", maybe("s"), "://", maybe("www.")), "ftp://", "smtp://" ) ), group(anythingBut(" /")) );
While this looks a bit more complex, we can more easily split it up and name things:
const protocol = or(concat("http", maybe("s"), "://"), "ftp://", "smtp://"); const removeWww = maybe("www."); const domain = anythingBut(" /"); const regex = VerEx(startOfLine, group(protocol), removeWww, group(domain));
This way we could test all of those "sub-expressions" (variables) in isolation.
— @jehna in #196
I'm trying to think of a way to handle flags with the VerEx
method. I came up with:
function VerEx(flags: Flags = defaultFlags, ...args: Expression[]): RegExp
Unfortunately this doesn't work when the first argument is of type Expression
rather than Flags
.
We could check the type of the first parameter from within the function, but I'm not a fan of that idea since it can't be statically inferred AFAIK and it … just feels like a hack.
Even…
function VerEx(...args: Expression[], flags: Flags = defaultFlags): RegExp
… does not work, obviously.
For reference, currently VerEx
's header is:
function VerEx(...args: Expression[]): RegExp
Also, Flags
is an interface
.
Does anyone have any ideas?
Edit: This has been resolved in a solution that is, in my opinion, satisfactory.
I have an idea. We could support both…
function VerEx(args: Expression[], flags: Flags = defaultFlags): RegExp
… and…
function VerEx(...args: Expression[]): RegExp
If the first parameter were an array, we would use the former and if not, we would use the latter. I'm also planning to replace concat(...expressions)
with support for using arrays as expressions, so maybe this would go with that.
Still interested in hearing better ideas.
This is obviously not ready for merging [1], but I would appreciate some comments on the work done thus far.
[1]: I think once we have docs, incomplete features, and bundling all sorted out, we should publish a pre-release (2.0.0-1
or something).
@jehna I'd appreciate if you would leave this PR a review.
Like I said, we're still not ready for production, of course. In terms of features and tests though I think we're pretty solid.
I understand it's a large PR, so take your time!