protoc-gen-validate icon indicating copy to clipboard operation
protoc-gen-validate copied to clipboard

How to represent regular expressions

Open akonradi opened this issue 8 years ago • 9 comments

The current defacto regular expression implementation is the one used by Go, which uses the re2 syntax. It isn't POSIX-compliant, nor is it immediately compatible with C++'s std::basic_regex and friends. This shows up most obviously when trying to use flags (. matches newline, case-insensitive matching, etc.) to modify the matching behavior: Go encodes these as part of the expression string while C++ uses a separate bitmask.

akonradi avatar Nov 27 '17 21:11 akonradi

JSON schema uses ECMAscript regexes (https://spacetelescope.github.io/understanding-json-schema/reference/regular_expressions.html), which is what C++ uses more or less. So, we should probably use that. Is there a Go lib for this?

htuch avatar Nov 27 '17 22:11 htuch

It's definitely not supported by the built-in regexp package. ECMAscript supports backreferences and re2 doesn't. I don't know about third party libraries, though.

akonradi avatar Nov 27 '17 23:11 akonradi

Currently, PGV is documented to support re2. Ideally, none of the generated code (any lang) will have dependencies outside of the stdlib. So...

We can limit to the POSIX ERE syntax, if that's something we can support out-of-the-box in C++?

rodaine avatar Nov 27 '17 23:11 rodaine

C++ can do something "similar" to ERE, see https://www.regular-expressions.info/stdregex.html for the caveats which mostly relate to non-ASCII and embedded line breaks. http://en.cppreference.com/w/cpp/regex/basic_regex as well.

htuch avatar Nov 28 '17 02:11 htuch

I suspect it won't be possible to avoid all dependencies outside of the standard libraries. UTF-8 support, which is required by some string validations, is not supported in the C++ standard library. I don't think URL or IP validation are either (though I may be mistaken). Go just happens to have a standard library with substantially more breadth than C++. That being said, re2 wouldn't be the worst thing to depend on, since it seems to have bindings for a reasonable number of languages.

akonradi avatar Nov 28 '17 16:11 akonradi

Now that we're adding more languages, I think it's time to revisit this. Both Java and Python support re2, and while I like not having dependencies outside the standard libraries, this seems like a good exception to make.

akonradi avatar May 29 '19 12:05 akonradi

@rodaine @htuch any thoughts here?

akonradi avatar Jun 17 '19 17:06 akonradi

I think we could live with re2 as an Envoy dependency, so if we have Go/Java/Python with out-of-the-box support, let's go with that.

htuch avatar Jun 24 '19 19:06 htuch

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Jul 26 '19 19:07 stale[bot]