iglu
iglu copied to clipboard
Figure out a way of supporting tests in Iglu
Fred wrote some nice tests for the core Snowplow schemas when these were a part of snowplow/snowplow. These can be seen here:
https://github.com/snowplow/snowplow/tree/40a5037563e729c67a922a3e2e67c4e5bb917809/0-common/schemas/jsonschema/tests
Fundamentally, tests divide into:
- Good tests - pass validation
- Bad tests - fail validation
So we need to think about how to store tests inside of Iglu. Starter for 10 - what about:
com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good com.snowplowanalytics.self-desc/instance/jsonschema/1-0-0/tests/bad
The idea is that tests that are good for 1-0-0 must also (by definition) be good for 1-0-1, 1-0-2 etc. Tests which are bad for 1-0-0 could be good for 1-0-1 so there's nothing we can reason about there.
@fblundun thoughts?
I guess it would be like this:
tests/good/1 tests/good/2 tests/bad/1 tests/bad/2 etc
and tests/good would return all the good tests.
ALSO: once we have tests/good/1-0, we can validate a new schema upload to jsonschema/1-0-1 that all good/1-0 tests pass.
I think we should use valid and invalid rather than good and bad, as per Fred's existing tests
Probably use UUIDs instead of /1, /2 etc. Also allows to remove existing tests if we want to.
http://stackoverflow.com/questions/7114694/should-i-use-uuids-for-resources-in-my-public-api
This all sounds good.
I wonder if it would be worth having a system where whenever a schema's version is bumped (e.g. from 1-0-0 to 1-0-1) we have a place for tests designed specifically to be validated by the new version but not by the old, to highlight the difference between the two.
The structure could be something like this:
com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good/3 would contain JSONs which should be validated by schema 1-0-x where x >= 3
com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/bad/3 would contain JSONs which should be rejected by schema 1-0-x where x <= 3
Then if we want to test schema 1-0-3, we check that it:
- validates all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good/y for all y <= 3
- rejects all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/bad/y for all y >= 3
In fact we could alternatively do away with the "good" and "bad" distinction and just have com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/z containing all examples which should be validated by 1-0-z but not by 1-0-(z-1).
Then to test schema version 1-0-3, we would check that it:
- validates all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/w for all w <= 3
- rejects all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/w for all w > 3
The disadvantage of this is that it might involve moving some test JSONs to a new directory when a new version is published, and that it's pretty complicated...
Hey Fred, lots of great thoughts there. I think what we're saying is that fundamentally, for a given MODEL-REVISION, tests are either:
- valid-from-ADDITION
- invalid-until-ADDITION
- invalid-forever
Is that a helpful taxonomy?
I think that's a helpful way to think about it. In terms of file structure, we could group test JSONs into 2 categories:
- invalid-forever, located in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/bad/
- valid-from-ADDITION, located in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/ADDITION
Then to test schema 1-0-x we make sure it validates everything in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/y for y >=x, and that it invalidates every other test JSON in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/.
Interesting! Possible simplification: we add all tests simply as:
com.snowplowanalytics.self-desc/instance/jsonschema/tests/f47ac10b-58cc-4372-a567-0e02b2c3d479
etc.
Then when you submit a new JSON Schema, all existing tests are run against it, and the response from the new JSON Schema registration contains a listing of all test stati.
Going further, when you request an individual test, it contains in its metadata which tests it succeeds against.
Going even further, there should be the opportunity to run a new potential schema against all tests without actually committing it.
Assigning to @BenFradet