protoc-gen-validate
protoc-gen-validate copied to clipboard
Well Known Regex Validation
Re: https://github.com/envoyproxy/envoy/pull/2420#discussion_r162995905
Add well known validation rules for regular expressions. On the Go side, supporting RE2 is a nominal effort, and likewise if the C++ code uses RE2 as well. Otherwise, we'll need to look into support for PCRE/POSIX
Wouldn't it be possible to leave the decision to consumers? Something along the lines:
message Regex {
enum Option {
REGEX_OPTION_INVALID = 0;
REGEX_OPTION_IGNORE_CASE = 1;
}
repeated Option options = 1;
string pattern = 2;
string pattern_js = 3;
string pattern_jvm = 4;
string pattern_pcre = 5;
string pattern_re2 = 6;
string pattern_rust = 7;
// ...
}
Regex regex = 42;
More specific regular expressions would override the generic pattern and the interpretation of the options would be up to the code generator. I'm sure there are other compositions possible that make it nicer to use in Proto but the general idea of mine should be clear by this example.
This could be useful for refining the existing regex checkers for arbitrary string/byte fields, but this particular validation is to ensure the provided string value is a valid regex pattern. Something similar could be done for this rule, but keeping it generic is important since -- for example -- even if it's a rust style regex pattern, there's no surefire way to verify in (say) python that the pattern is valid without pulling in a dependency or reimplementing the rust regex parser.
Only the actual code generators can verify if a given pattern is valid. It would be up to the users to write patterns that are valid for their generator and language. I don't see a way how you could ensure that in this library for all possible languages. The only thing that can be provided is dedicated fields where the patterns can be stored for the actual generators.
The logic in such a generator would then be as follows, e.g. Rust in pseudocode:
val pattern = if (regex.pattern_rust.isNotEmpty()) {
regex.pattern_rust
} else {
regex.pattern
}
if (pattern.isInvalidRegexPattern()) {
doSomethingAboutIt()
}