specs
specs copied to clipboard
Add `itemType` constraint to `array` type
This attribute will allow specifying the exact items that are allowed in an array.
e.g.
{
"name": "years",
"type": "array",
"constraints": {
"typeOf": "year"
}
}
Related
Doing this with appropriate semantics would also resolve
- Enum constraint with arrays in table schema #549
@akariv really like it. Again may be v1.1
Related: https://github.com/frictionlessdata/specs/issues/381
+1 on this, and we should do a PR once v1 settles.
@akariv would you be up for doing a PR to implement this?
I also think that rather than just have the type we might as well go the whole hog and just have the whole field schema be reused for the individual item. In this case the property name might be something like an itemSchema
or itemType
.
Alternatively, since most of the constraints in constraints only make sense for individual values we could implicitly apply them to the individual values in the array e.g. max, min, enum etc in which case we do only need typeOf
.
Thoughts @roll ...
@rufuspollock that's my suggestion in #410
Isn't the difference that for array all entries are implicitly of the same type?
On Thu, Apr 30, 2020 at 9:44 PM Adam Kariv [email protected] wrote:
@rufuspollock https://github.com/rufuspollock that's my suggestion in #410 https://github.com/frictionlessdata/specs/issues/410
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/frictionlessdata/specs/issues/409#issuecomment-622066672, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABMDMWJTBM4NABT36V6N53RPHIKXANCNFSM4DJSDHSQ .
table
type could be seen as syntactic sugar for an array
field of itenType=object
and a mandatory schema
property.
In both cases the datatype is the same in all array entries.
I think itemType
should be as simple as possible thing. I would move all the complexity to something like #410 and v2 of the specs
@roll generally agree - though could you clarify what "as simple as possible" looks like for you - and would it address the related stuff e.g. #549?
@rufuspollock
I think there are two options
Simple
https://specs.frictionlessdata.io/table-schema/#array
array
The field contains data that is a valid JSON format arrays.
Because it's a JSON we can probably treat items just as native JSON values (itemType
):
- string
- number
- boolean
- null
It will mean no processing on the TableSchema level but just checking that a JSON has items with the same given type e.g. [1,2,3]
or [true,false]
- just re-using native JSON types.
We can also add a itemConstraints
property.
This option will keep things simple.
Complex
Another option is to have full itemSchema
or something like as you mentioned. This adds Table Schema level processing to array's items e.g. an ability to extract parsed vales from items as strings ["10$", "30$"]
or ["2020-02-01", "2020-04-05"]
.
With the second option, we add an ability to process array's items with all the power of Table Schema.
This option is much more complex though.
I think that itemType
should have the same semantics as type
(i.e.read a tableschema type name and not a json).
Array items would then be considered as physical representation and would be processed to extract the logical value.
But then, we would need also itemConstraints
, itemFormat
, itemDecimalChar
and there's no end to this.
Therefore, a different proposal for this would be to create a single arrayItem
configuration, which is parsed exactly as a tableschema field.
Like so:
...
{
"name": "futurePaymentDates",
"type": "array",
"constraints": {
"maxLength": 12
},
"arrayItem": {
"type": "date",
"format": "%Y/%m/%d"
}
}
This also means that schema
, as a field specification, might be added to the spec as an optional parameter of an object
field - but is unrelated to the array
item issue.
But then, we would need also
itemConstraints
,itemFormat
,itemDecimalChar
and there's no end to this.
Yea exactly, without all the options a Table Schema type is partially useless.
I think that
itemType
should have the same semantics astype
(i.e.read a tableschema type name and not a json).
It can be itemJsonType
. It will
- solve "80%" of use cases
- keep
array
very simple - give an ability to design a proper nested type in e.g. #410
I'm not sure that it's the best option just trying to figure out the pros.
There is something elegant in reusing the existing schema structures when adding this feature. I don't think constraining the arrays to simple JSON types would provide a lot of value. I favor @akariv's proposal.
OK, it looks like we have convergence here.
@roll @rufuspollock
I see https://github.com/frictionlessdata/frictionless-py/issues/627 has been referenced here but it solves a fundamentally different issue.
It would be great to see this original request added. It is very common that members of an array be same typed, and declaring it as part of table schema would allow:
- In the SQL driver, use of Array fields for backends that support it (which would be a more logically correct mapping than JSONB fields, and allow round-tripping data)
- In the Elasticsearch driver, use of array fields
- Most likely most other data backends that support an array as a type would require members to have the same type
Note that this behavior ( also #410 ) is already implemented in some form in https://github.com/frictionlessdata/tableschema-elasticsearch-py and has a bunch of successful usage in production systems.
Happy to have PRs on this - or even start with a pattern and implement into say python. Remember the complex thing with the specs now is that one then needs all drivers to upgrade to be compliant.