iglu Figure out a way of supporting tests in Iglu

Fred wrote some nice tests for the core Snowplow schemas when these were a part of snowplow/snowplow. These can be seen here:

https://github.com/snowplow/snowplow/tree/40a5037563e729c67a922a3e2e67c4e5bb917809/0-common/schemas/jsonschema/tests

Fundamentally, tests divide into:

Good tests - pass validation
Bad tests - fail validation

So we need to think about how to store tests inside of Iglu. Starter for 10 - what about:

com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good com.snowplowanalytics.self-desc/instance/jsonschema/1-0-0/tests/bad

The idea is that tests that are good for 1-0-0 must also (by definition) be good for 1-0-1, 1-0-2 etc. Tests which are bad for 1-0-0 could be good for 1-0-1 so there's nothing we can reason about there.

@fblundun thoughts?

Jun 18 '14 09:06 alexanderdean

I guess it would be like this:

tests/good/1 tests/good/2 tests/bad/1 tests/bad/2 etc

and tests/good would return all the good tests.

ALSO: once we have tests/good/1-0, we can validate a new schema upload to jsonschema/1-0-1 that all good/1-0 tests pass.

Jun 18 '14 09:06 alexanderdean

I think we should use valid and invalid rather than good and bad, as per Fred's existing tests

Jun 18 '14 09:06 alexanderdean

Probably use UUIDs instead of /1, /2 etc. Also allows to remove existing tests if we want to.

http://stackoverflow.com/questions/7114694/should-i-use-uuids-for-resources-in-my-public-api

Jun 18 '14 09:06 alexanderdean

This all sounds good.

I wonder if it would be worth having a system where whenever a schema's version is bumped (e.g. from 1-0-0 to 1-0-1) we have a place for tests designed specifically to be validated by the new version but not by the old, to highlight the difference between the two.

The structure could be something like this:

com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good/3 would contain JSONs which should be validated by schema 1-0-x where x >= 3

com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/bad/3 would contain JSONs which should be rejected by schema 1-0-x where x <= 3

Then if we want to test schema 1-0-3, we check that it:

validates all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/good/y for all y <= 3
rejects all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/bad/y for all y >= 3

In fact we could alternatively do away with the "good" and "bad" distinction and just have com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/z containing all examples which should be validated by 1-0-z but not by 1-0-(z-1).

Then to test schema version 1-0-3, we would check that it:

validates all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/w for all w <= 3
rejects all JSONs in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/w for all w > 3

The disadvantage of this is that it might involve moving some test JSONs to a new directory when a new version is published, and that it's pretty complicated...

Jun 18 '14 09:06 fblundun

Hey Fred, lots of great thoughts there. I think what we're saying is that fundamentally, for a given MODEL-REVISION, tests are either:

valid-from-ADDITION
invalid-until-ADDITION
invalid-forever

Is that a helpful taxonomy?

Jun 18 '14 10:06 alexanderdean

I think that's a helpful way to think about it. In terms of file structure, we could group test JSONs into 2 categories:

invalid-forever, located in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/bad/
valid-from-ADDITION, located in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/ADDITION

Then to test schema 1-0-x we make sure it validates everything in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/y for y >=x, and that it invalidates every other test JSON in com.snowplowanalytics.self-desc/instance/jsonschema/1-0/tests/.

Jun 18 '14 10:06 fblundun

Interesting! Possible simplification: we add all tests simply as:

com.snowplowanalytics.self-desc/instance/jsonschema/tests/f47ac10b-58cc-4372-a567-0e02b2c3d479

etc.

Then when you submit a new JSON Schema, all existing tests are run against it, and the response from the new JSON Schema registration contains a listing of all test stati.

Going further, when you request an individual test, it contains in its metadata which tests it succeeds against.

Going even further, there should be the opportunity to run a new potential schema against all tests without actually committing it.

Jun 18 '14 11:06 alexanderdean

Assigning to @BenFradet

Jul 14 '14 09:07 alexanderdean

iglu iglu copied to clipboard

Figure out a way of supporting tests in Iglu

iglu
iglu copied to clipboard