pysaml2
pysaml2 copied to clipboard
Does not handle valid time properly
ValueError: time data '2017-08-29T09:16:45.0631274-05:00' does not match format '%Y-%m-%dT%H:%M:%SZ'
https://github.com/rohe/pysaml2/blob/fd7a4f694b137a92f2a8b7f502d51fc21e3528c7/src/saml2/time_util.py#L19
What fails, how and under which circumstances ?
when timestr = '2017-08-29T09:16:45.0631274-05:00'
this function does no handle this format https://github.com/rohe/pysaml2/blob/fd7a4f694b137a92f2a8b7f502d51fc21e3528c7/src/saml2/time_util.py#L232
Where does this timestr comes from ?Is it an IssueInstant, a NotBeforeOrAfter ? Can we get the stacktrace ?
I think this is a valid bug and we probably need to come up with a different regexp to validate dateTime even if the one we use now matches the most commonly used format
It comes as <samlp:Response and it is in every datetime attribute including IssueInstant
I think that this could be helpfull
import datetime
import dateutil.parser
def getDateTimeFromISO8601String(s):
d = dateutil.parser.parse(s)
return d
s = '2017-08-29T09:16:45.0631274-05:00'
getDateTimeFromISO8601String(s)
datetime.datetime(2017, 8, 29, 9, 16, 45, 63127, tzinfo=tzoffset(None, -18000))
Tested on Python 2.7.12. It also worked with the following combinations
s = '2017-08-29T09:16:45.0631274-05:00'
s = '2017-08-29T09:16:45.0631274'
s = '2017-08-29T09:16:45.0631'
s = '2017-08-29T09'
s = '2017-08-29 03:01:05'
# italian format
s = '29/08/2017 03:01:05'
# also if string seems truncated
s = '2017-08-29T'
In other words we could subsitute the actual str_to_time function with dateutil.parser.parse or, more in the deep, refactor the entire time_util.py file with standard python libraries.
Looking at the SAML2 xsd, I see that
<attribute name="IssueInstant" type="dateTime" use="required"/>
IssueInstant is of type xsd:dateTime. The XML Schema datatypes spec mandates
The ·lexical space· of dateTime consists of finite-length sequences of characters of the form:
'-'? yyyy '-' mm '-' dd 'T' hh ':' mm ':' ss ('.' s+)? (zzzzzz)?
This means that 2017-08-29T09:16:45.0631274-05:00 is a valid value which makes this issue a proper bug report, and it should be fixed 👍
According to the dateTime datatype spec, it is mandatory to include the year, the month, the day, the hour, the minutes and the seconds. One can omit only the second fragments and the timezone info. This means that some of the results that dateutil.parser.parse returns (as proposed by @peppelinux) are invalid.
I feel it is OK to relax this rule and accept datetime values such as 29/08/2017 03:01:05 and 2017-08-29T which would otherwise be invalid by the spec. If people think this should be strict we can do that by fixing the TIME_FORMAT_WITH_FRAGMENT regex and make consistent use of that.
I also agree that time_utils.py should be refactored to use standard built-in methods to do time calculations.
I went through the code and it seems pysaml2 uses many different libs:
timebuiltin moduledatetimebuiltin modulecalendarbuiltin moduledateutillibpytzlib
Additionally, not all time-related operations are handled through time_utils.py.
I also went through the XML Schema spec about dateTime datatype and I feel very weird reading that
The ·value space· of dateTime is closely related to the dates and times described in ISO 8601
meaning that the dateTime datatype does not represent/support the full ISO 8601 specification, and at the same time there is no clear text about the differences. I cannot understand why someone
found it better to semi-specify a new format instead of taking advantage of a specification about the exact thing they wanted to specify in the first place 😞.
Regardless, I started looking for libraries with ISO 8601 support. To my surprise, there is not much support for a well-defined standard, but there are lots of libs. I looked at the python modules {time, datetime, calendar}, and libraries: pytz, arrow, moment, iso8601, isodatetime, ciso8601, numpy/datetime64, pandas/timeseries and finally aniso8601.
I think that aniso8601 stands out as the most complete implementation that is taking edge cases seriously. Additionally, it is a pure python implementation and works with the standard datetime python type. I think a refactoring should be based on that lib.
@c00kiemon5ter you're right, I didn't want to introduce invalid datetime lenght but only show dateutils usage flexibility. Nothing more. I agree that iso format should be mandatory.
aniso8601 looks good.
I found also: pyiso8601 and iso8601utils It seems also that starting from Python 3.7 there's a native support with colon delimiters in UTC offsets. It also come with datetime.fromisoformat function.
Coming back to this.
The SAML2 core specification specifies:
1.3.3 Time Values
All SAML time values have the type
xs:dateTime, which is built in to the W3C XML Schema Datatypes specification [Schema2], and MUST be expressed in UTC form, with no time zone component. SAML system entities SHOULD NOT rely on time resolution finer than milliseconds. Implementations MUST NOT generate time instants that specify leap seconds.
This conflicts with the dateTime datatype specification as it disallows the timezone component and considers all dates to have been specified as UTC beforehand. The dateTime specification defines two timelines - one for timezoned dateTimes and one for untimezoned dateTimes. The SAML2 spec essentially dictates that there is only one timeline, always timezoned as UTC.
This makes things easier for the implementers, as for example, they need not consider how to compare dateTimes from different timelines (a timezoned dateTime with an untimezoned datetime.)
As such, the originally posted date 2017-08-29T09:16:45.0631274-05:00 is invalid, as it contains timezone info -05:00. The correct representation is 2017-08-29T14:16:45.063127 (converted to UTC/Z/±00:00 and timezone info removed.)
The question now becomes whether implementations actually hold that promise, and if not, how do we handle it..
Personally I would like to implement the most flexible and smart way to handle time, with or without timezone, with the possibility to succesfully handle exceptions on theri occurrence.
It's more like a hopefull whish, not a real TODO
This was further discussed in the mailing list and ..it seems everything is backwards. This is what the technical committee had to say:
Subject: Re: [saml-dev] dateTime "time zone component" Date: Wed, 2 Oct 2013 15:53:00 +0000
The SAML spec has this part about "MUST be expressed in UTC form, with no time zone component." which would literally support the format given above (I think; xmlschema dateTime itself would certainly allow for that) but as Ian (and Olav, I think) point out in this thread
https://groups.google.com/forum/#!topic/simplesamlphp/7m07m9gLzzA
the spec probably intended to say no timezone /offset/?
I would think so.
-- Scott
which means that the timezone information should be there, but it is only allowed to be 'Z'.
The plan is:
- accept all (ISO-8601) formats that we can parse, and convert them to UTC.
- if there is no tzinfo, we assume the other side is respecting the current wording of the SAML2-core specification, and thus the time is already in UTC.
All dates are normalised to UTC and can be compared and worked-with under that assumption. Dates we produce should be in ISO-8601 format always in UTC with timezone info present as 'Z'.