protoc-gen-validate Sharing validation rules in a project

Another way to think about this might be how to allow custom "well-known rules".

Primarily I'm interested in sharing regex patterns in a larger project without needing to take the route of upstreaming all well-known types.

For this use case, I don't have the flexibility of wrapping scalar types in a message that everyone can share.
It would be nice if all rules are shareable, not just regex rules.
I can Ideally, this mechanism could also be convenient enough to be used for defining the PGV well-known types (but not a hard requirement). This would allow PGV to ship well-known types decoupled from the core implementation (similar to how common google protobuf types are just a library of proto messages).

I know this gets at the heart of protobuf's lack of string constants (or constants of any kind).

Here's the sketch of one approach (that would require some changes to how validators work). I've thought this through for java. Generally, you can think of this approach as delegating field validation to another validator.

Provide a family of rules (e.g. (validate.rules).TYPE.like = "fully.qualified.Message.field_name" that takes a string. The string references a specific message field defined elsewhere in the project that has PGV rules applied to it.
In the generated java code we now have 2 generated validators one for the prototypical message and another for the message annotated with (validate.rules).TYPE.like. Assuming we generate individual methods for each field, the like rule validator can delegate checking.

Example:


package acme.common;

message Common {
  string id = 1 [
    (validate.rules).string.pattern = "^blah"
  ];
}


package acme.my_service_a;

message Request {
  string id = 1 [(validate.rules).string.like = "acme.common.Common.id"];
}

At this point, in the java generated code at least, we have CommonValidator and a RequestValidator. If we enhance the *Validator code generation to allow individual field validation methods, RequestValidator can validate id by calling CommonValidator.validateId (or whatever the per-field methods get called).

Performance considerations:

In languages like Java/C# these bodies should get inlined by the jit as soon as class hierarchy analysis is performed.
Inlining should also apply to languages like go (or whenever static link-type optimizations are performed).

Usability considerations:

The idea of a fully qualified message name is well defined by protobuf but I'm not so sure about a fully qualified field name. (validate.rules).string.like = "acme.common.Common.id" Could instead be expressed as like = {message: "acme.common.Common", field: "id"} to more closely match the MessageDescriptor + FieldDescriptor.
Optionally, the PGV generator can check that the referenced message is available on the protoc -I import path to ensure that both validators are generated. This is a usability optimization though since if the common validator isn't on the protoc path, the generated code will fail to compile when trying to reference the common validator.

Oct 27 '20 18:10 smorel-plenty

We also would very much like something to improve the status quo for extending well-known types.

For example, we are using ksuid instead of UUIDs - currently we can put a regexp for every ksuid field, but it would be great to have an extension point for protoc-gen-validate to reduce repetition and inconsistencies in our validators. We have a similar issue with our `"display name" and "metadata" fields we attach to many objects - its less about a consistent regexp and more the min/max bytes.

In the Go ecosystem, we can manually implement the Validate() interface if the field is an message type, but many of our use cases are for scalar types.

Dec 11 '20 20:12 pquerna

Any update on this? This would be highly desirable. In our proto files we have repetitive scalars (validate.rules).string.pattern regex checks for safe chars (^[0-9a-zA-Z.\-/() $!*+,:;=?@[\]_~æøåÆØÅ½§@£$~ÀÁÂÃÄÇÈÉÊËÌÍÎÏÑÒÓÔÕÖÙÚÛÜÝßàáâãäçèéêëìíîïñòóôõöùúûüýÿ \n\r`#{}%&']+$"). Repeating this is ugly and error prone. Thanks!

Feb 02 '21 17:02 jaijiv

I put together an example of what this API could look like in protobuf using stricter typing.

changes that would need to be made to validate.proto

extend google.protobuf.EnumValueOptions {
    optional FieldRules custom_rules = 1071;
}

message FieldRules {
   ...
   
        //add new oneof value for custom wellknown       
        google.protobuf.Any custom_well_known = 23;
    }
}

example of what it would look like to define custom well know rules my_custom_wellknown_rules.proto

message CustomRules {
  enum Rule {
    SEMVER = 0[
      (validate.custom_rules).string = {
        pattern: "^(\\d)+\\.(\\d)+\\.(\\d)+$"
        min_len: 3
      }
    ];

    LONG_ID = 1[
      (validate.custom_rules).string = {
        min_len: 100
      }
    ];
  }

  repeated Rule rule = 1;
}

what the usage would look like from the message type


message ExampleUsage {

  string some_field = 1 [
    (validate.rules).custom_well_known = {
      [type.googleapis.com/my.packge.v1.CustomRules]:{
        rule: SEMVER
        rule: LONG_ID
      }
    }
  ];

}

Using the Any type meant that I was able to get proper intellisense and type completion in my IDE while writing the ExampleUsage message

Oct 12 '22 14:10 marcoferrer

Thank you for highlighting the importance of sharing validation rules for reusing custom validation rules within a project. This is an area of interest for us as we continue to enhance the capabilities of bufbuild/protovalidate#51.

Your input is greatly appreciated, and it aligns with our goals for future developments. As a result, I'll be closing this issue while keeping your suggestion in mind.

Should you have more suggestions or questions down the line, don't hesitate to get in touch. Your involvement is invaluable!

Aug 10 '23 17:08 elliotmjackson