Add regex ASCII flag support for matching builtin character classes
Description
Adds ASCII flag to the libcudf regex_flags for support with builtin character classes: \w, \W, \s, \S, \d, \D.
Somewhat equivalent to https://docs.python.org/3/library/re.html#re.ASCII
But strictly the flag modifies matching for these classes as follows:
\w=[a-zA-Z_](alphabetic characters plus underline)\W=[^\w](basically not\w)\s=[\t- ](tab through space in the ASCII table)\S=[^\s](basically not\s)\d=[0-9](digit characters)\D=[^\d](basically not\d)
Additional gtests are included for this flag with these classes. This will be exposed through Python/Cython in a follow up PR.
Closes #10894
Checklist
- [x] I am familiar with the Contributing Guidelines.
- [x] New or existing tests cover these changes.
- [x] The documentation is up to date with these changes.
Codecov Report
:exclamation: No coverage uploaded for pull request base (
branch-22.10@0df6178). Click here to learn what that means. The diff coverage isn/a.
:exclamation: Current head 428bf8b differs from pull request most recent head f20a82e. Consider uploading reports for the commit f20a82e to get more accurate results
@@ Coverage Diff @@
## branch-22.10 #11404 +/- ##
===============================================
Coverage ? 86.35%
===============================================
Files ? 145
Lines ? 22945
Branches ? 0
===============================================
Hits ? 19815
Misses ? 3130
Partials ? 0
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
Does nobody use the regex engine from the C++ side? I am just surprised we don't have any API docs that explain the regex keywords we support.
Spark uses the C++ regex code. We have some good libcudf documentation here: https://docs.rapids.ai/api/libcudf/stable/md_regex.html
Looks like I should update it for this new flag.
@gpucibot merge