JSVerbalExpressions
JSVerbalExpressions copied to clipboard
Functional rewrite
I've been thinking about re-writing JSVerbalExpressions to use function composition rather than the builder-like pattern it has now.
So now the README.md describes a simple example for using VerbalExpressions as such:
const tester = VerEx()
.startOfLine()
.then('http')
.maybe('s')
.then('://')
.maybe('www.')
.anythingBut(' ')
.endOfLine();
This can be described as a builder-like extension for the native RegExp
object; you can chain the expression and add more stuff to "build" a complete regular expression.
This is very clear approach for building simple, "one-dimensional" regular expressions. The problem with current implementation starts to surface when we start doing more complicated stuff like capture groups, lookaheads/behinds, using "or" pipe etc makes the expression quickly grow out of maintainability and readability.
For example, I think something like this is impossible to implement with VerbalExpressions at the moment:
/^((?:https?:\/\/)?|(?:ftp:\/\/)|(?:smtp:\/\/))([^ /]+)$/
To make it simpler, I'm proposing a 2.0 rewrite of VerbalExpressions that would take a functional approach, something like:
VerEx(
startOfLine,
"http",
maybe("s"),
"://",
maybe("www."),
anythingBut(" "),
endOfLine
)
Motivation for this approach would be:
- We can split regular expressions into multiple variables
- Naming "sub-expressions" allows better naming, different abstraction levels in regular expressions
- Each small part is testable with unit tests
- Makes grouping explicit (enforce closing an opened capture group)
So the simplest example could be something like this:
const regex = VerEx(
startOfLine,
"http",
maybe("s"),
"://",
maybe("www."),
anythingBut(" "),
endOfLine
);
And the complex example could be written e.g. like this:
VerEx(
startOfLine,
group(
or(
concat("http", maybe("s"), "://", maybe("www.")),
"ftp://",
"smtp://"
)
),
group(anythingBut(" /"))
);
While this looks a bit more complex, we can more easily split it up and name things:
const protocol = or(concat("http", maybe("s"), "://"), "ftp://", "smtp://");
const removeWww = maybe("www.");
const domain = anythingBut(" /");
const regex = VerEx(startOfLine, group(protocol), removeWww, group(domain));
This way we could test all of those "sub-expressions" (variables) in isolation.
Some examples where compositional/functional patterns has been used:
Huh. Interesting.
So for something like:
VerEx(
startOfLine,
"http",
maybe("s"),
"://",
maybe("www."),
anythingBut(" "),
endOfLine
)
… would the import statement look like one of the following:
import { VerEx, startOfLine, maybe, anythingBut, endOfLine } from verbal-expressions;
import * from verbal-expressions;
A bit concerned about global scope pollution…
ES module/TypeScript imports would look like this:
import { VerEx, startOfLine, maybe, anythingBut, endOfLine } from 'verbal-expressions'
On node.js require you can use:
const { VerEx, startOfLine, maybe, anythingBut, endOfLine } = require('verbal-expressions')
If we want to still support global browser scripts, then a common practice with this kind of libraries (e.g. Ramda, lodash) is to use a short single-character namespace. We could namespace with V
or ve
. In that case you would use the library as:
V.VerEx(
V.startOfLine,
"http",
V.maybe("s"),
"://",
V.maybe("www."),
V.anythingBut(" "),
V.endOfLine
)
Sounds good.
I'd like to help out with this. How do we work this out?
I can create a POC draft pull request to show a couple of ideas, and we can iterate from that. Does that sound good?
Sure.
@jehna How about I create a 2.0.0
branch and write some failing tests while you build your proof of concept?
Ok, so I did some work that I'd like to show you: #197