generate-schema
generate-schema copied to clipboard
BigQuery: Look at more objects of an array for type inference
When generating a schema for BigQuery, only the first member of the array is looked at for schema generation. If this object contains optional fields, looking at one member may not be sufficient.
If you generate a schema from the following object, and then try to insert that object with the options {ignoreUnknownValues: false}
, it fails.
{
arrayField: [ { snafu: true }, { snafu: true, foobar: true }
}
Same with JSON...
{ "test": [ { "a": 0, "b": 1 }, { "a": 2, "b": 3, "c": 4 }, { "a": 5, "c": 6 } ] }
results in
{ "$schema": "http://json-schema.org/draft-04/schema#", "type": "object", "properties": { "test": { "type": "array", "items": { "type": "object", "properties": { "a": { "type": "integer" }, "b": { "type": "integer" }, "c": { "type": "integer" } }, "required": [ "a", "b", "c" ] } } } }
where of all 3 entries in array the minimal property to be required is only a so required should be only a
@sqlrob @janerikmai I wrote a wrapper module that module that accomplishes this basic idea for BigQuery. Not too fancy, but is working for our use case. If someone wanted to open a PR here using that logic that'd be great, but I'll get around to it eventually.