protoc-gen-validate icon indicating copy to clipboard operation
protoc-gen-validate copied to clipboard

Well Known Regex Validation

Open rodaine opened this issue 7 years ago • 3 comments

Re: https://github.com/envoyproxy/envoy/pull/2420#discussion_r162995905

Add well known validation rules for regular expressions. On the Go side, supporting RE2 is a nominal effort, and likewise if the C++ code uses RE2 as well. Otherwise, we'll need to look into support for PCRE/POSIX

rodaine avatar Jan 22 '18 19:01 rodaine

Wouldn't it be possible to leave the decision to consumers? Something along the lines:

message Regex {
    enum Option {
        REGEX_OPTION_INVALID = 0;
        REGEX_OPTION_IGNORE_CASE = 1;
    }
    repeated Option options = 1;
    string pattern = 2;
    string pattern_js = 3;
    string pattern_jvm = 4;
    string pattern_pcre = 5;
    string pattern_re2 = 6;
    string pattern_rust = 7;
    // ...
}
Regex regex = 42;

More specific regular expressions would override the generic pattern and the interpretation of the options would be up to the code generator. I'm sure there are other compositions possible that make it nicer to use in Proto but the general idea of mine should be clear by this example.

Fleshgrinder avatar Oct 01 '18 07:10 Fleshgrinder

This could be useful for refining the existing regex checkers for arbitrary string/byte fields, but this particular validation is to ensure the provided string value is a valid regex pattern. Something similar could be done for this rule, but keeping it generic is important since -- for example -- even if it's a rust style regex pattern, there's no surefire way to verify in (say) python that the pattern is valid without pulling in a dependency or reimplementing the rust regex parser.

rodaine avatar Oct 01 '18 18:10 rodaine

Only the actual code generators can verify if a given pattern is valid. It would be up to the users to write patterns that are valid for their generator and language. I don't see a way how you could ensure that in this library for all possible languages. The only thing that can be provided is dedicated fields where the patterns can be stored for the actual generators.

The logic in such a generator would then be as follows, e.g. Rust in pseudocode:

val pattern = if (regex.pattern_rust.isNotEmpty()) {
    regex.pattern_rust
} else {
    regex.pattern
}

if (pattern.isInvalidRegexPattern()) {
    doSomethingAboutIt()
}

Fleshgrinder avatar Oct 02 '18 09:10 Fleshgrinder