google-cloudevents
google-cloudevents copied to clipboard
jsonschema is overly permissive
Problem
The jsonschema for Pub/Sub (and presumably the other events) is overly permissive, making broad use of additionalProperties: true
and no required
properties. As a result, straightforward mistakes in shaping a JSON payload can easily validate as correct.
Solution
Revisit the decisions around additionalProperties
and required
to ensure we have appropriate flexibility, but not false positives in validation. Doing so will also improve the clarity and useful of the schema in contexts other than data validation.
Background
In #315 a new data example was proposed. It seemed reasonable and stripped down, but it failed on the dotnet test suite.
System.NullReferenceException: Object reference not set to an instance of an object.
at Google.Events.ValidateSchema.Program.MaybeModifyJson(String protoPackage, String messageName, String json) in /home/runner/work/google-cloudevents/google-cloudevents/google-cloudevents-dotnet/src/Google.Events.ValidateSchema/Program.cs:line 151
at Google.Events.ValidateSchema.Program.ValidateSchemaData(String testRoot, String file) in /home/runner/work/google-cloudevents/google-cloudevents/google-cloudevents-dotnet/src/Google.Events.ValidateSchema/Program.cs:line 89
Two things are notable here: 1) this repos own test suite missed the problem, 2) this error message was too vague for me to parse without more familiarity with the dotnet validation library. I copied the schema and the test data into jsonschema.validator
First, I got an error parsing the schema:
Error when resolving schema reference '#/definitions/MessagePublishedData'. Path '', line 1, position 1.
This is because the first property in some schema validation tools must be the $schema
declaration. I moved that up.
I immediately saw that the data validated. https://www.jsonschemavalidator.net/s/5MDuSxtc However, the data has been flagged as invalid. What's wrong?
The schema has a top-level additionalProperties: true
. This means the schema will ignore properties it does not recognize. Since no properties are marked as required, {}
is considered a valid PubSub Message. This problem seems recursive into the schema.
Switching that value to false
, I see many clear indicators that these expected properties are out of place: https://www.jsonschemavalidator.net/s/HJxS13pq
Point 2 is a failure on my part - I'll have a look at the schema validation code tomorrow. (Any NullReferenceException is a code bug...) Thanks for flagging it!
https://github.com/googleapis/google-cloudevents-dotnet/pull/98 improves the .NET tooling - the validation would fail with this instead:
Google.Protobuf.InvalidProtocolBufferException: Unknown field: data
at Google.Protobuf.JsonParser.Merge(IMessage message, JsonTokenizer tokenizer)
at Google.Protobuf.JsonParser.Merge(IMessage message, TextReader jsonReader)
at Google.Protobuf.JsonParser.Parse(TextReader jsonReader, MessageDescriptor descriptor)
at Google.Protobuf.JsonParser.Parse(String json, MessageDescriptor descriptor)
at Google.Events.ValidateSchema.Program.ValidateSchemaData(String testRoot, String file) in C:\users\jonskeet\GitHub\googleapis\google-cloudevents-dotnet\src\Google.Events.ValidateSchema\Program.cs:line 90
That's still not ideal as it only gives the field name rather than the whole path to the field, but it's at least more of a clue than before.
Yup, sounds like cleaning up additionalProperties: true
could be a good improvement to the schemas.
I think we need to have our protos adopt the optional
field for proto3 (3.15+) protos for us to reasonably add a required
field to our JSON schema required properties.
Closing not because this is fixed, but as we are replacing the generation process which will eliminate this problem.
Sounds good, looking forward to the updated generation process!