JSVerbalExpressions icon indicating copy to clipboard operation
JSVerbalExpressions copied to clipboard

Functional Rewrite (2.0.0)

Open shreyasminocha opened this issue 5 years ago • 4 comments

See also #196 and #197.

For a sneak peak, check out the tests in the test directory, especially those in test/examples.

Completed features

  • anyCharacterFrom[…]
  • anyCharacterBut[^…]
  • group(…)
    • group.capturing(…)
    • group.nonCapturing(?:…)
    • group.named | group.capturing.named(?<foo>…)
  • backReference\1 or \k<foo>
  • or — (?…|…|…)
  • maybe | optionally…?
    • maybe.greedy…?
    • maybe.lazy…??
  • multiple | zeroOrMore…*
    • multiple.greedy…*
    • multiple.lazy…*?
  • oneOrMore…+
    • oneOrMore.greedy…+
    • oneOrMore.lazy…+?
  • repeat
    • repeat(x)…{x}
    • repeat(min, Infinity)…{x,}
    • repeat(min, max)…{min,max}
    • corresponding greedy alias and lazy variations similar to oneOrMore
  • lookahead(?=…)
    • lookahead.negative(?!…)
    • lookahead.positive(?=…)
  • lookbehind(?<=…)
    • lookbehind.negative(?<!…)
    • lookbehind.positive(?<=…)
  • anyCharacter.
  • digit\d
  • nonDigit\D
  • whitespaceCharacter\s
  • nonWhitespaceCharacter\S
  • wordCharacter\w
  • nonWordCharacter\W
  • something.+
  • anything.*
  • startOfLine^
  • endOfLine$
  • wordBoundary\b
  • nonWordBoundary\B
  • concat
  • Flags

Incomplete/planned features

  • Unicode code point escapes
  • Unicode property escapes
  • ~~String replacement helper constants and functions~~ Dropped. See https://github.com/VerbalExpressions/JSVerbalExpressions/pull/198/commits/aeb11c9ff461f5f84ea765b0444e341238035ff5.
  • … you tell me :)

Tests

They're all up to date from my end and we have 100% coverage although let me know if you notice something that's not being properly tested. At the moment there are over 15 test suites and over 150 tests.

Docs

I haven't written any docs at the moment, although if you want to get a feel of how things will be, check out the tests in the test directory, especially test/examples. I'm thinking of a gatsby site with the actual docs written in mdx although I'm open to other ideas.

This PR should also resolve

  • #7
  • #30
  • #164
  • #167
  • #186
  • #192
  • #194
  • #202

I've been thinking about re-writing JSVerbalExpressions to use function composition rather than the builder-like pattern it has now.

So now the README.md describes a simple example for using VerbalExpressions as such:

const tester = VerEx()
    .startOfLine()
    .then('http')
    .maybe('s')
    .then('://')
    .maybe('www.')
    .anythingBut(' ')
    .endOfLine();

This can be described as a builder-like extension for the native RegExp object; you can > chain the expression and add more stuff to "build" a complete regular expression.

This is very clear approach for building simple, "one-dimensional" regular expressions. The problem with current implementation starts to surface when we start doing more complicated stuff like capture groups, lookaheads/behinds, using "or" pipe etc makes the > expression quickly grow out of maintainability and readability.

For example, I think something like this is impossible to implement with VerbalExpressions at the moment:

/^((?:https?:\/\/)?|(?:ftp:\/\/)|(?:smtp:\/\/))([^ /]+)$/

To make it simpler, I'm proposing a 2.0 rewrite of VerbalExpressions that would take a functional approach, something like:

VerEx(
startOfLine,
"http",
maybe("s"),
"://",
maybe("www."),
anythingBut(" "),
endOfLine
)

Motivation for this approach would be:

  • We can split regular expressions into multiple variables
  • Naming "sub-expressions" allows better naming, different abstraction levels in regular expressions
  • Each small part is testable with unit tests
  • Makes grouping explicit (enforce closing an opened capture group)

So the simplest example could be something like this:

const regex = VerEx(
  startOfLine,
  "http",
  maybe("s"),
  "://",
  maybe("www."),
  anythingBut(" "),
  endOfLine
);

And the complex example could be written e.g. like this:

VerEx(
  startOfLine,
  group(
    or(
      concat("http", maybe("s"), "://", maybe("www.")),
      "ftp://",
      "smtp://"
    )
  ),
  group(anythingBut(" /"))
);

While this looks a bit more complex, we can more easily split it up and name things:

const protocol = or(concat("http", maybe("s"), "://"), "ftp://", "smtp://");
const removeWww = maybe("www.");
const domain = anythingBut(" /");
const regex = VerEx(startOfLine, group(protocol), removeWww, group(domain));

This way we could test all of those "sub-expressions" (variables) in isolation.

— @jehna in #196

shreyasminocha avatar Sep 23 '19 11:09 shreyasminocha

I'm trying to think of a way to handle flags with the VerEx method. I came up with:

function VerEx(flags: Flags = defaultFlags, ...args: Expression[]): RegExp

Unfortunately this doesn't work when the first argument is of type Expression rather than Flags.

We could check the type of the first parameter from within the function, but I'm not a fan of that idea since it can't be statically inferred AFAIK and it … just feels like a hack.

Even…

function VerEx(...args: Expression[], flags: Flags = defaultFlags): RegExp

… does not work, obviously.

For reference, currently VerEx's header is:

function VerEx(...args: Expression[]): RegExp

Also, Flags is an interface.

Does anyone have any ideas?

Edit: This has been resolved in a solution that is, in my opinion, satisfactory.

shreyasminocha avatar Sep 24 '19 06:09 shreyasminocha

I have an idea. We could support both…

function VerEx(args: Expression[], flags: Flags = defaultFlags): RegExp

… and…

function VerEx(...args: Expression[]): RegExp

If the first parameter were an array, we would use the former and if not, we would use the latter. I'm also planning to replace concat(...expressions) with support for using arrays as expressions, so maybe this would go with that.

Still interested in hearing better ideas.

shreyasminocha avatar Sep 24 '19 19:09 shreyasminocha

This is obviously not ready for merging [1], but I would appreciate some comments on the work done thus far.

[1]: I think once we have docs, incomplete features, and bundling all sorted out, we should publish a pre-release (2.0.0-1 or something).

shreyasminocha avatar Nov 01 '19 08:11 shreyasminocha

@jehna I'd appreciate if you would leave this PR a review.

Like I said, we're still not ready for production, of course. In terms of features and tests though I think we're pretty solid.

I understand it's a large PR, so take your time!

shreyasminocha avatar Nov 14 '19 03:11 shreyasminocha