tableschema: time formats should be more consistent and portable across platforms
the specs currently support formats for time/date formats which are not portable and might produce different results when called from different platforms / library implementations:
- any -
- not defined at all, supports anything thrown at it. Can produce different results depending on the implementation.
- strptime format string -
- strptime is not a defined standard, it is an implementation detail that works differently on different platforms and library implementations, some examples:
- PHP - supports strptime, but silently ignores timezone details
- Python 2.7 - doesn't support timezone (raises exception when %z part is specified)
- strptime is platform dependent, and might work differently on the OS level in different platforms (I understand that on Windows it has some major problems)
I think we should drop the additional format and support only the ISO standard format
Another option - support only a small subset of strptime, and validate that only that subset is used.
@OriHoch Totally agree.
This is an example where limiting choice probably benefits everyone (data authors, consumers, and developers). I'd suggest dropping formats any and default, scratching field types date, time, datetime, year, and yearmonth, and requiring a pattern among the following subset of the core ISO 8601 standard:
- YYYY-MM-DDThh:mm:ssZ / %Y-%m-%dT%H:%M:%SZ – UTC
- YYYY-MM-DDThh:mm:ss / %Y-%m-%dT%H:%M:%S – Unknown time zone
- YYYY-MM-DDThh:mm(Z) / %Y-%m-%dT%H:%M(Z)
- YYYY-MM-DDThh(Z) / %Y-%m-%dT%H(Z)
- YYYY-MM-DD / %Y-%m-%d
- YYYY-MM / %Y-%m
- YYYY / %Y
In other words, either you know the time zone and convert to UTC (and tag with "Z"), or you don't know the time zone (and drop the "Z").
@ezwelty @OriHoch this was a classic trade-off of supporting publishers in describing data as is vs a desirable strictness for consumers. Remember the specs have to balancing supporting publishers who may be publishing data they don't control and making it easier for tool writers to use the spec.
I'm not sure we've yet got the trade-off right for date formats but what i would say is you see a lot of variety in date formats in the wild (think of what you see in google spreadsheets or excel as options). We want to support that as we can because many publishers may be constrained to use that. At the same time I get that this is problematic for consumers and tool authors and that what we have there may not yet be specific enough. My request here would be see what we can do with v1 as is and consider revisions for v1.1 based on more experience in the wild.
given this constraint, I would suggest the following for v1.1:
anyformat - should be defined explicitly in the spec (can be as simple as a list of date/time formats to try according to priority)- strptime - define in the spec the exact format codes supported, tools should validate that only those format codes are used
@OriHoch ok, good idea. Let's come back to this when we start working on the v1.1 release.
This may also help infer formats https://github.com/frictionlessdata/tableschema-js/issues/98
As a data publisher, I strongly support Rufus Pollock's comment above. I would like to see time zone support added for ISO8610 formats e.g. 2016-12-25T00:01:01Z+10
@Stephen-Gates Probably a typo, but just in case: 2016-12-25T00:01:01Z+10 should be written as 2016-12-25T00:01:01+10 (or +10:00 or +1000). The Z is a shorthand for +00:00, aka UTC. See
https://en.wikipedia.org/wiki/ISO_8601#Time_zone_designators
It's difficult to implement the entire ISO 8601 spec, but I think it'd be reasonable to extend the list I suggested above (https://github.com/frictionlessdata/specs/issues/487#issuecomment-315523646) with support for non UTC ('Z') time zone designators ±hh:mm, ±hhmm, and ±hh.
Thanks @ezwelty. Including the Z is my mistake.
From @stevage:
"date"/default allows any ISO8601 format, which is incredibly broad (and includes rarely supported features like recurring dates and intervals). Do we intend this? "date"/datetime requires UTC. Do we not allow times without timezones? "date": Do we not allow times with milliseconds?
From Rufus:
@pwalsh i agree we should restrict date to yyyy-mm-dd[+T stuff +TZ stuff] strictly. wdyt. Agree basically on all these simplications.
From @peterdesmet
Frictionless Framework will correctly parse the following datetimes with a format = default (datapackage.json.zip):
2013-11-23T08:30:00 # No timezone
2013-11-23T08:30:00Z # UTC time
2013-11-23T06:30:00-0200 # Timezone offset
This is great! But according to the specs only UTC times should be supported (excluding offsets or no timezone):
default: An ISO8601 format string e.g.
YYYY-MM-DDThh:mm:ssZin UTC time
Is the more broad support intentional? Should the specs be updated to drop the in UTC time?
Originating issue: https://github.com/tdwg/camtrap-dp/issues/333
:raised_hands: