regex-benchmark
regex-benchmark copied to clipboard
D: use ctRegex
Thanks, my knowledge about D is limited, so I have some doubts.
- What is the difference from the actual implementation? And why do you want to change it?
- Does it make sense to keep only one implementation or both?
Hi,
ctRegex compiles regular expression at compile-time.
I expected three things for performnace:
- avoid runtime regex construction cost, including for unicode.
- avoid heap allocations.
- compiles to native code and could be replaced with specialized instruction set.
see also this cool article, it's for rust's regex! macro (deprecated now described here, but very useful!).
However, in my local, the benchmark shows no difference. (sorry, I should check before send PR!)
I'm digging, and will close it found the reason.
Thanks for the info!
I ran it on my computer, and there's a small change, not huge, but it's better.
DMD - v2.103.0
- slower than
optimized
branch.
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git branch
* d-compile-time-regex
master
optimized
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ dmd -O -release d/benchmark.d
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
307.404800 - 92
300.025700 - 5301
4.375800 - 5
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git checkout optimized
Switched to branch 'optimized'
Your branch is up to date with 'upstream/optimized'.
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ dmd -O -release d/benchmark.d
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
262.630300 - 92
269.145000 - 5301
5.823400 - 5
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
264.894800 - 92
268.622300 - 5301
5.635600 - 5
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git checkout d-compile-time-regex
Switched to branch 'd-compile-time-regex'
Your branch is up to date with 'origin/d-compile-time-regex'.
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ dmd -O -release d/benchmark.d
(dmd-2.103.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
290.224200 - 92
283.388900 - 5301
4.662600 - 5
LDC - v1.32.0
- much faster than
optimized
.
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git checkout optimized
Switched to branch 'optimized'
Your branch is up to date with 'upstream/optimized'.
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ldc2 -O3 -release d/benchmark.d
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git branch
d-compile-time-regex
master
* optimized
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ldc2 -O3 -release d/benchmark.d
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
167.561100 - 92
163.916900 - 5301
4.397100 - 5
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ git checkout d-compile-time-regex
Switched to branch 'd-compile-time-regex'
Your branch is up to date with 'origin/d-compile-time-regex'.
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ldc2 -O3 -release d/benchmark.d
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
88.026900 - 92
88.755400 - 5301
3.594100 - 5
(ldc-1.32.0)kubo39@hinoda:~/dev/kubo39/regex-benchmark$ ./benchmark input-text.txt
88.634500 - 92
88.571300 - 5301
3.624200 - 5
Just use:
import std.array;
auto m = data.matchAll(ctRegex!(pattern));
count = cast(int) m.array.length;
It is easy to read and run faster than foreach.
Thanks, my knowledge about D is limited, so I have some doubts.
- What is the difference from the actual implementation? And why do you want to change it?
- Does it make sense to keep only one implementation or both?
I propose to remain both. Currently ctRegex should work faster. But in D community many people don't like this approach - because it increase compilation time significantly. There were even some talks to remove ctRegex from std library. But it is just some rumors - and it is better to have both. It will be ease to remove one solution in future in case something will changed.