specs icon indicating copy to clipboard operation
specs copied to clipboard

Add `itemType` constraint to `array` type

Open akariv opened this issue 7 years ago • 16 comments

This attribute will allow specifying the exact items that are allowed in an array.

e.g.

{
   "name": "years",
   "type": "array",
   "constraints": {
      "typeOf": "year"
  }
}

Related

Doing this with appropriate semantics would also resolve

  • Enum constraint with arrays in table schema #549

akariv avatar Apr 30 '17 09:04 akariv

@akariv really like it. Again may be v1.1

rufuspollock avatar May 24 '17 00:05 rufuspollock

Related: https://github.com/frictionlessdata/specs/issues/381

danfowler avatar May 24 '17 03:05 danfowler

+1 on this, and we should do a PR once v1 settles.

pwalsh avatar May 29 '17 07:05 pwalsh

@akariv would you be up for doing a PR to implement this?

I also think that rather than just have the type we might as well go the whole hog and just have the whole field schema be reused for the individual item. In this case the property name might be something like an itemSchema or itemType.

Alternatively, since most of the constraints in constraints only make sense for individual values we could implicitly apply them to the individual values in the array e.g. max, min, enum etc in which case we do only need typeOf.

Thoughts @roll ...

rufuspollock avatar Apr 30 '20 19:04 rufuspollock

@rufuspollock that's my suggestion in #410

akariv avatar Apr 30 '20 19:04 akariv

Isn't the difference that for array all entries are implicitly of the same type?

On Thu, Apr 30, 2020 at 9:44 PM Adam Kariv [email protected] wrote:

@rufuspollock https://github.com/rufuspollock that's my suggestion in #410 https://github.com/frictionlessdata/specs/issues/410

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/frictionlessdata/specs/issues/409#issuecomment-622066672, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABMDMWJTBM4NABT36V6N53RPHIKXANCNFSM4DJSDHSQ .

rufuspollock avatar Apr 30 '20 20:04 rufuspollock

table type could be seen as syntactic sugar for an array field of itenType=object and a mandatory schema property.

In both cases the datatype is the same in all array entries.

akariv avatar Apr 30 '20 20:04 akariv

I think itemType should be as simple as possible thing. I would move all the complexity to something like #410 and v2 of the specs

roll avatar May 04 '20 06:05 roll

@roll generally agree - though could you clarify what "as simple as possible" looks like for you - and would it address the related stuff e.g. #549?

rufuspollock avatar May 04 '20 10:05 rufuspollock

@rufuspollock

I think there are two options

Simple

https://specs.frictionlessdata.io/table-schema/#array

array
The field contains data that is a valid JSON format arrays.

Because it's a JSON we can probably treat items just as native JSON values (itemType):

  • string
  • number
  • boolean
  • null

It will mean no processing on the TableSchema level but just checking that a JSON has items with the same given type e.g. [1,2,3] or [true,false] - just re-using native JSON types.

We can also add a itemConstraints property.

This option will keep things simple.

Complex

Another option is to have full itemSchema or something like as you mentioned. This adds Table Schema level processing to array's items e.g. an ability to extract parsed vales from items as strings ["10$", "30$"] or ["2020-02-01", "2020-04-05"].

With the second option, we add an ability to process array's items with all the power of Table Schema.

This option is much more complex though.

roll avatar May 04 '20 11:05 roll

I think that itemType should have the same semantics as type (i.e.read a tableschema type name and not a json).

Array items would then be considered as physical representation and would be processed to extract the logical value.

But then, we would need also itemConstraints, itemFormat, itemDecimalChar and there's no end to this.

Therefore, a different proposal for this would be to create a single arrayItem configuration, which is parsed exactly as a tableschema field.

Like so:

...
{
  "name": "futurePaymentDates",
  "type": "array",
  "constraints": {
    "maxLength": 12
  },
  "arrayItem": {
    "type": "date",
    "format": "%Y/%m/%d"
  }
}

This also means that schema, as a field specification, might be added to the spec as an optional parameter of an object field - but is unrelated to the array item issue.

akariv avatar May 04 '20 12:05 akariv

But then, we would need also itemConstraints, itemFormat, itemDecimalChar and there's no end to this.

Yea exactly, without all the options a Table Schema type is partially useless.

I think that itemType should have the same semantics as type (i.e.read a tableschema type name and not a json).

It can be itemJsonType. It will

  • solve "80%" of use cases
  • keep array very simple
  • give an ability to design a proper nested type in e.g. #410

I'm not sure that it's the best option just trying to figure out the pros.

roll avatar May 04 '20 12:05 roll

There is something elegant in reusing the existing schema structures when adding this feature. I don't think constraining the arrays to simple JSON types would provide a lot of value. I favor @akariv's proposal.

JDziurlaj avatar May 12 '20 19:05 JDziurlaj

OK, it looks like we have convergence here.

rufuspollock avatar May 13 '20 01:05 rufuspollock

@roll @rufuspollock

I see https://github.com/frictionlessdata/frictionless-py/issues/627 has been referenced here but it solves a fundamentally different issue.

It would be great to see this original request added. It is very common that members of an array be same typed, and declaring it as part of table schema would allow:

  • In the SQL driver, use of Array fields for backends that support it (which would be a more logically correct mapping than JSONB fields, and allow round-tripping data)
  • In the Elasticsearch driver, use of array fields
  • Most likely most other data backends that support an array as a type would require members to have the same type

Note that this behavior ( also #410 ) is already implemented in some form in https://github.com/frictionlessdata/tableschema-elasticsearch-py and has a bunch of successful usage in production systems.

pwalsh avatar Sep 29 '21 09:09 pwalsh

Happy to have PRs on this - or even start with a pattern and implement into say python. Remember the complex thing with the specs now is that one then needs all drivers to upgrade to be compliant.

rufuspollock avatar Sep 29 '21 12:09 rufuspollock