Regex
Regex copied to clipboard
A pure Swift NFA implementation of a regular expression engine
Regex (V2 WIP)
A pure Swift implementation of a Regular Expression Engine
Trying again with V2 using DFAs instead of NFAs to get grep-like performance
Usage
To avoid compiling overhead it is possible to create a Regex
instance
// Compile the expression
let regex = try! Regex(pattern: "[a-zA-Z]+")
let string = "RegEx is tough, but useful."
// Search for matches
let words = regex.match(string)
/*
words = [
RegexMatch(match: "RegEx", groups: []),
RegexMatch(match: "is", groups: []),
RegexMatch(match: "tough", groups: []),
RegexMatch(match: "but", groups: []),
RegexMatch(match: "useful", groups: []),
]
*/
If compiling overhead is not an issue it is possible to use the =~
operator to match a string
let fourLetterWords = "drink beer, it's very nice!" =~ "\\b\\w{4}\\b" ?? []
/*
fourLetterWords = [
RegexMatch(match: "beer", groups: []),
RegexMatch(match: "very", groups: []),
RegexMatch(match: "nice", groups: []),
]
*/
By default the Global
flag is active. To change which flag are active, add a /
at the start of the pattern, and add /<flags>
at the end. The available flags are:
-
g
Global
- Allows multiple matches -
i
Case Insensitive
- Case insensitive matching -
m
Multiline
-^
and$
also match the begining and end of a line
// Global and Case Insensitive search
let regex = try! Regex(pattern: "/\\w+/ig")
Supported Operations
Character Classes
Pattern | Description | Supported |
---|---|---|
. |
[^\n\r] |
|
[^] |
[\s\S] |
|
\w |
[A-Za-z0-9_] |
|
\W |
[^A-Za-z0-9_] |
|
\d |
[0-9] |
|
\D |
[^0-9] |
|
\s |
[\ \r\n\t\v\f] |
|
\S |
[^\ \r\n\t\v\f] |
|
[ABC] |
Any in the set |
|
[^ABC] |
Any not in the set |
|
[A-Z] |
Any in the range inclusively |
|
Anchors (Match positions not characters)
Pattern | Description | Supported |
---|---|---|
^ |
Beginning of string |
|
$ |
End of string |
|
\b |
Word boundary |
|
\B |
Not word boundary |
|
Escaped Characters
Pattern | Description | Supported |
---|---|---|
\0 |
Octal escaped character |
|
\00 |
Octal escaped character |
|
\000 |
Octal escaped character |
|
\xFF |
Hex escaped character |
|
\uFFFF |
Unicode escaped character |
|
\cA |
Control character |
|
\t |
Tab |
|
\n |
Newline |
|
\v |
Vertical tab |
|
\f |
Form feed |
|
\r |
Carriage return |
|
\0 |
Null |
|
\. |
. |
|
\\ |
\ |
|
\+ |
+ |
|
\* |
* |
|
\? |
? |
|
\^ |
^ |
|
\$ |
$ |
|
\{ |
{ |
|
\} |
} |
|
\[ |
[ |
|
\] |
] |
|
\( |
( |
|
\) |
) |
|
\/ |
/ |
|
| |
` | ` |
Groups and Lookaround
Pattern | Description | Supported |
---|---|---|
(ABC) |
Capture group |
|
(<name>ABC) |
Named capture group |
|
\1 |
Back reference |
|
\'name' |
Named back reference |
|
(?:ABC) |
Non-capturing group |
|
(?=ABC) |
Positive lookahead |
|
(?!ABC) |
Negative lookahead |
|
(?<=ABC) |
Positive lookbehind |
|
(?<!ABC) |
Negative lookbehing |
|
Greedy Quantifiers
Pattern | Description | Supported |
---|---|---|
+ |
One or more |
|
* |
Zero or more |
|
? |
Optional |
|
{n} |
n |
|
{,} |
Same as * |
|
{,n} |
n or less |
|
{n,} |
n or more |
|
{n,m} |
n to m |
|
Lazy Quantifiers
Pattern | Description | Supported |
---|---|---|
+? |
One or more |
|
*? |
Zero or more |
|
?? |
Optional |
|
{n}? |
n |
|
{,n}? |
n or less |
|
{n,}? |
n or more |
|
{n,m}? |
n to m |
|
Alternation
Pattern | Description | Supported |
---|---|---|
| |
Everything before or everything after |
|
Flags
Pattern | Description | Supported |
---|---|---|
i |
Case insensitive |
|
g |
Global |
|
m |
Multiline |
|
Inner Workings
(Similar to before)
- Lexer (String input to Tokens)
- Parser (Tokens to NFA)
- Compiler (NFA to DFA)
- Optimizer (Simplify DFA (eg.
char(a), char(b)
->string(ab)
) for better performance) - Engine (Matches an input String using the DFA)
Note
Swift treats \r\n
as a single Character
. Use \n\r
to have both.
Resources
- regexr.com - Regex testing
- swtch.com - Implementing Regular Expressions
- Powerset construction - NFA to DFA
- Minimization