rregex icon indicating copy to clipboard operation
rregex copied to clipboard

Rust Regex binding for Javascript

rregex

A WebAssembly build of Rust Regex for Javascript

Note: this project is not intended to be used in production jet

  • Why Rust Regex
  • Install
  • API
    • isMatch(text: string): boolean
    • isMatchAt(text: string, limit: number): boolean
    • find(text: string): Match | undefined;
    • findAt(text: string): Match | undefined;
    • findAll(text: string): Match[];
    • replace(text: string, rep: string): string;
    • replaceAll(text: string, rep: string): string;
    • replacen(text: string, limit: number, rep: string): string;
    • split(text: string): string[];
    • splitn(text: string, limit: number): string[];
    • shortestMatch(text: string): number | undefined;
    • shortestMatchAt(text: string, limit: number): number | undefined;
  • Known Issues

Why Rust Regex

Rust has a powerful Regex library with a lot of features that don't exists en the standard Regex object

See the official documentation for more detail

Install

npm install rregex

API

Similar to the native Regex object you can create a new RRegex instance using a string.

import { RRegex } from "rregex";
const re = new RRegex("^d{4}-d{2}-d{2}$");
assert.equal(re.isMatch("2014-01-01"), true);

Note: It doesn't take a second parameter because fags are part of the syntax (See Documentation)

isMatch(text: string): boolean;

Returns true if and only if there is a match for the regex in the string given. (See Documentation)

const text = "I categorically deny having triskaidekaphobia.";
const re = new RRegex("\\b\\w{13}\\b");
expect(re.isMatch(text)).toEqual(true);

isMatchAt(text: string, limit: number): boolean;

Returns the same as is_match, but starts the search at the given offset. (See Documentation)

const text = "I categorically deny having triskaidekaphobia.";
const re = new RRegex("\\b\\w{13}\\b");
expect(re.isMatchAt(text, 1)).toBe(true);
expect(re.isMatchAt(text, 5)).toBe(false);

find(text: string): Match | undefined;

Returns the start and end byte range of the leftmost-first match in text. If no match exists, then undefined is returned. (See Documentation)

const text = "I categorically deny having triskaidekaphobia.";
const re = new RRegex("\\b\\w{13}\\b");
expect(re.find(text)).toEqual({
  value: "categorically",
  start: 2,
  end: 15,
});

findAt(text: string): Match | undefined;

Returns the same as find, but starts the search at the given offset. (See Documentation)

const text = "I categorically deny having triskaidekaphobia.";
const re = new RRegex("\\b\\w{13}\\b");
expect(re.findAt(text, 1)).toEqual({
  value: "categorically",
  start: 2,
  end: 15,
});

expect(re.findAt(text, 5)).toEqual(undefined);

findAll(text: string): Match[];

Returns an array for each successive non-overlapping match in text, returning the start and end byte indices with respect to text (See Documentation)

    const text = 'Retroactively relinquishing remunerations is reprehensible.'
    const re = new RRegex('\\b\\w{13}\\b')
    expect(re.findAll(text)).toEqual([
      {
        "end": 13,
        "start": 0,
        "value": "Retroactively",
      },
      {
        "end": 27,
        "start": 14,
        "value": "relinquishing",
      },
      {
        "end": 41,
        "start": 28,
        "value": "remunerations",
      },
      {
        "end": 58,
        "start": 45,
        "value": "reprehensible",
      },
    ])
  })

replace(text: string, rep: string): string;

Replaces the leftmost-first match with the replacement provided.

If no match is found, then a copy of the string is returned unchanged. (See Documentation)

Note that this function is polymorphic with respect to the replacement. In typical usage, this can just be a normal string:

const re = new RRegex("[^01]+");
expect(re.replace("1078910", "")).toBe("1010");

But this is a bit cumbersome to use all the time. Instead, a simple syntax is supported that expands $name into the corresponding capture group. Here’s the last example, but using this expansion technique with named capture groups:

const re = new RRegex("(?P<last>[^,\\s]+),\\s+(?P<first>\\S+)");
const result = re.replace("Springsteen, Bruce", "$first $last");
expect(result).toEqual("Bruce Springsteen");

Note that using $2 instead of $first or $1 instead of $last would produce the same result. To write a literal $ use $$.

Sometimes the replacement string requires use of curly braces to delineate a capture group replacement and surrounding literal text. For example, if we wanted to join two words together with an underscore:

const re = new RRegex("(?P<first>\\w+)\\s+(?P<second>\\w+)");
const result = re.replace("deep fried", "${first}_$second");
expect(result).toEqual("deep_fried");

Without the curly braces, the capture group name first_ would be used, and since it doesn’t exist, it would be replaced with the empty string.

replaceAll(text: string, rep: string): string;

Replaces all non-overlapping matches in text with the replacement provided. This is the same as calling replacen with limit set to 0.

See the documentation for replace for details on how to access capturing group matches in the replacement string. (See Documentation)

replacen(text: string, limit: number, rep: string): string;

Replaces at most limit non-overlapping matches in text with the replacement provided. If limit is 0, then all non-overlapping matches are replaced. (See Documentation)

split(text: string): string[];

Returns an iterator of substrings of text delimited by a match of the regular expression. Namely, each element of the iterator corresponds to text that isn’t matched by the regular expression. (See Documentation)

const re = new RRegex("[ \\t]+");
const fields = re.split("a b \t  c\td    e");
expect(fields).toEqual(["a", "b", "c", "d", "e"]);

splitn(text: string, limit: number): string[];

Returns an iterator of at most limit substrings of text delimited by a match of the regular expression. (A limit of 0 will return no substrings.) Namely, each element of the iterator corresponds to text that isn’t matched by the regular expression. The remainder of the string that is not split will be the last element in the iterator.(See Documentation)

const re = new RRegex("\\W+");
const fields = re.splitn("Hey! How are you?", 3);
expect(fields).toEqual(["Hey", "How", "are you?"]);

shortestMatch(text: string): number | undefined;

Returns the end location of a match in the text given.

This method may have the same performance characteristics as is_match, except it provides an end location for a match. In particular, the location returned may be shorter than the proper end of the leftmost-first match. (See Documentation)

    const text = 'aaaaa'
    const pos = new RRegex('a+')
    expect(pos.shortestMatch(text)).toBe(1)

shortestMatchAt(text: string, limit: number): number | undefined;

Returns the same as shortest_match, but starts the search at the given offset. (See Documentation)

Known Issues

If you call splitn(text, limit) and the expected result length is equal to limit - 1 the result will include an extra item "", this behavior does not happen if limit es greater. fixed at >=1.3

  const regex = new RRegex(',')
  expect(regex.splitn('a,b,c', 0)).toEqual([])
  expect(regex.splitn('a,b,c', 1)).toEqual(['a,b,c'])
  expect(regex.splitn('a,b,c', 2)).toEqual(['a', 'b,c'])
  expect(regex.splitn('a,b,c', 3)).toEqual(['a', 'b', 'c'])

  // This result includes an unexpected extra item
  expect(regex.splitn('a,b,c', 4)).toEqual(['a', 'b', 'c', ''])
  expect(regex.splitn('a,b,c', 5)).toEqual(['a', 'b', 'c'])

  expect(regex.splitn('abc', 0)).toEqual([])
  expect(regex.splitn('abc', 1)).toEqual(['abc'])

  // This result includes an unexpected extra item
  expect(regex.splitn('abc', 2)).toEqual(['abc', ''])
  expect(regex.splitn('abc', 3)).toEqual(['abc'])