Boost regex engine
See PR #722, @atauzki 👍 is working on integrating Boost regex , after the changes are merged most if not all regex issues should be fixed.
At the end our code will have three regex engines:
| defined preprocessors | regex engine |
|---|---|
BOOST_REGEX_STANDALONE |
Scintilla's simple POSIX regex plus Boost regex |
NO_CXX11_REGEX |
Scintilla's simple POSIX regex (current build configuration) |
| none | Scintilla's simple POSIX regex plus C++ STL std::regex |
Some TODOs:
- [x] Ensure all existing use of regex still works after change to Boost.
- [x] Update regex help string, add common syntax like
a|b,{n,m}, etc. based on https://www.boost.org/doc/libs/1_83_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html, https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions and https://docs.python.org/3/library/re.html - [x] Add option to select regex engine (Boost Unicode, Boost ASCII, Scintilla ASCII), could default to Boost Unicode.
- [x] Add option "Dot match all character (include new line)" or "Multiline mode" on Find and Replace dialogs, issue #53.
Another suggestion: zero-width match's hints should be improved like Notepad3:
following is some performance test results (match count and time in millisecond) for attached JSON file (produced by expand.py in the zip for Visual Studio 2022 instalation catalog.json) with commit 38be0ce92e5106cf0c6f49ed1f6864795121d042. As such I'm going to remove SCI_OWNREGEX build configuration (still needs time to improve the speed).
re-test-1015.zip
| regex | RESearch | std::wregex | std::regex | boost::wregex | boost::regex |
|---|---|---|---|---|---|
\w+ |
1434315, 315 | 1436523, 7636 | 1423835, 4372 | 1436523, 2035 | 1501396, 800 |
[a-zA-Z0-9_]+ |
1423835, 331 | 1423835, 7654 | 1423835, 4386 | 1423835, 2855 | 1423835, 777 |
\d+ |
1028016, 280 | 1028016, 6470 | 1028016, 6475 | 1028016, 2050 | 1028016, 739 |
[0-9]+ |
1028016, 286 | 1028016, 6475 | 1028016, 6218 | 1028016, 2044 | 1028016,725 |
\s+ |
895401, 252 | 895945, 6151 | 895403, 5972 | 895917, 1883 | 911355, 662 |
[ \t]+ |
895401, 254 | 895401, 6375 | 895401, 6200 | 895401, 2935 | 895401, 678 |
^[ \t]+ |
440216, 92 | 440216, 846 | 440216, 724 | 440216, 465 | 440216, 234 |
[ \t]+$ |
0, 154 | 0, 6492 | 0, 6324 | 0, 575 | 0, 84 |
今天发布的版本有没有包含Boost regex ??我看替换对话框没啥变化哦。
今天发布的版本有没有包含Boost regex
Just download latest builds from boost regex branch, e.g. https://github.com/zufuliu/notepad2/actions/runs/7517811166
Win32 build with boost::regex (depends on SleepConditionVariableSRW() and WakeAllConditionVariable()) or std::regex (depends on InitializeCriticalSectionEx()) doesn't run on XP.
Seems not supported (requires ICU). https://www.boost.org/doc/libs/1_85_0/libs/regex/doc/html/boost_regex/unicode.html https://www.boost.org/doc/libs/1_85_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html#boost_regex.syntax.perl_syntax.character_properties
另外匹配的
(pattern)目前是用\1,\2引用 将来会考虑用常用的$1,$2代替吗
boost本身支持,但是现在的代码没有用这个实现,只加了个TODO注释
Win32 build with
boost::regex(depends onSleepConditionVariableSRW()andWakeAllConditionVariable()) orstd::regex(depends onInitializeCriticalSectionEx()) doesn't run on XP.
This can be "fixed" by disabling thread-safe local static initialization with /Zc:threadSafeInit-:
https://learn.microsoft.com/en-us/cpp/build/reference/zc-threadsafeinit-thread-safe-local-static-initialization?view=msvc-170
The implementation of this feature relies on Windows operating system support functions in Windows Vista and later operating systems.
Another bug related to boost regex search:
if execute a zero-width match (eg: ^, $, \b) searching next/previous for multiple times, it just stucks at its original place from the second time.
Emeditor also has this bug but Notepad3 doesn't, I had no good idea working on this.
请问下
boost::regex是不支持匹配\pP这种属性匹配吗 更多测试属性 正则表达式-匹配标点符号
libICU编译出来至少20-30M吧,代价太大。要支持这个功能可以用PCRE2,就看有没有这个计划了
libICU编译出来至少20-30M吧,代价太大。
Maybe dynamic load ICU (which is available on Win10+), https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-
libICU编译出来至少20-30M吧,代价太大。
Maybe dynamic load ICU (which is available on Win10+), https://learn.microsoft.com/en-us/windows/win32/intl/international-components-for-unicode--icu-
it doesn't have icu namespace in it's icu.h, but boost uses icu's c++ api. And no C++ symbol exported in icu.dll.
boost_regex branch is merged into main (still not set as default engine due to the slower speed), here are new strings (added by 98f9eb26e38fd6ebb0446a4adf22fdccb57a69c4 and 59366c691dc6a1ebd5b1772748517bccd159f0e3) need to be translated, cc @Matteo-Nigro, @maboroshin, @VenusGirl.
707e258bd3164ab69c9d37bbcc9b83d585368321 made a small change to strings on Find/Replace dialog, should not affect existing translations: