iglu-central icon indicating copy to clipboard operation
iglu-central copied to clipboard

Wrong schema version in com.snowplowanalytics.snowplow.badrows/loader_runtime_error

Open andrewhabib opened this issue 4 years ago • 2 comments

Hi again,

In com.snowplowanalytics.snowplow.badrows/loader_runtime_error, there is a wrong increment of the schema version from 1-0-0 to 1-0-1 which does not respect the SchemaVer.

The change from version 1-0-0 to version 1-0-1 introduces several breaking changes through

  • changes the key event to payload
  • changes error to failure

Therefore, the newer schema version should pump the model version to become 2-0-0 instead of 1-0-1

andrewhabib avatar Aug 11 '20 17:08 andrewhabib

Hi @andrewhabib!

Thanks for checking these (#1071, #1072, #1073, #1074) - we agree proper versioning is very important part of our schemaing tech and its shame we still need humans to do this work.

However, we think that #1071 and #1073 do not go strictly against the specification in the sense that for versioning to serve its purpose it is fine to jump over the version (e.g. to bump MODEL whereas it should have beem just ADDITION or REVISION). This ensures that nothing working with those schemas will be broken by a sudden bump. In worst case scenario, UX will be slightly worse (e.g. new table create whereas one could go with single one), but it also highly depends on the actual use case for schema and honestly we're very okay with bumping major in configuration schemas (#1071) and more protective for actual data (#1072).

However, we also agree that #1072 and #1074 were more serious mistakes. Our short-term plan is to release new proper versions for these schemas and switch all producers the to those schemas. We'll never get rid of all improperly schemaed data, but at least can reduce its impact.

Our long-term plan is to implement an automatic algorithm to recognize versioning and clarify/formalise specification for known corner-cases (e.g. different consumers can have different compatibility requirements).

I'm going to close #1071 and #1073 as wontfix, but leave #1072 and #1074 for further short-term actions.

Thanks again for raising!

chuwy avatar Aug 14 '20 13:08 chuwy

Hi @chuwy,

Thank you for confirming all the reports!

I do understand and agree that some of the schema errors are less severe (pumping model number when it should have been patch or revision is understandably tolerable, although not desirable) than the others (mistakenly not pumping the model number when backward incompatible changes are introduced).

Since you mention your intention to implement an algorithm and/or formalize the specification, you and snowplow team might be interested in our JSON subschema tool which is a general tool aimed at checking the subtype relation of JSON schemas. You can also check our paper which covers more details, specially regarding the formalization of the approach. In fact, we detected all the issues I reported using our tool :)

The tool is sound (when it gives a result, it is always correct) but incomplete (some corner cases are not handled, i.e., the tool fails to give an answer). However, according to our analysis of several hundred schemas, we believe it covers most of the widely used properties of JSON schema.

In the iglu-central use case, we envision the tool being used to validate the relation between older versions and the new version of a specific schema before officially committing it to the repository and releasing. Please let us know if you find it useful, or if you need help using it.

andrewhabib avatar Aug 16 '20 17:08 andrewhabib