TimeZones.jl icon indicating copy to clipboard operation
TimeZones.jl copied to clipboard

Support for PDF standard time format.

Open sambitdash opened this issue 8 years ago • 2 comments

I am implementing a PDF Data time format which is of the format:

Dates

Date values used in a PDF shall conform to a standard date format, which closely follows that of the international standard ASN.1 (Abstract Syntax Notation One), defined in ISO/IEC 8824. A date shall be a text string of the form ( D : YYYYMMDDHHmmSSOHH ' mm ) where:

  • YYYY shall be the year
  • MM shall be the month (01–12)
  • DD shall be the day (01–31)
  • HH shall be the hour (00–23)
  • mm shall be the minute (00–59)
  • SS shall be the second (00–59)
  • O shall be the relationship of local time to Universal Time (UT), and shall be denoted by one of the characters PLUS SIGN (U+002B) (+), HYPHEN-MINUS (U+002D) (-), or LATIN CAPITAL LETTER Z (U+005A) (Z) (see below)
  • HH followed by APOSTROPHE (U+0027) (') shall be the absolute value of the offset from UT in hours (00–23)
  • mm shall be the absolute value of the offset from UT in minutes (00–59)

The prefix D: shall be present, the year field (YYYY) shall be present and all other fields may be present but only if all of their preceding fields are also present. The APOSTROPHE following the hour offset field (HH) shall only be present if the HH field is present. The minute offset field (mm) shall only be present if theAPOSTROPHE following the hour offset field (HH) is present. The default values for MM and DD shall be both01; all other numerical fields shall default to zero values. A PLUS SIGN as the value of the O field signifies that local time is later than UT, a HYPHEN-MINUS signifies that local time is earlier than UT, and the LATIN CAPITAL LETTER Z signifies that local time is equal to UT. If no UT information is specified, the relationship of the specified time to UT shall be considered to be GMT. Regardless of whether the time zone is specified, the rest of the date shall be specified in local time.

EXAMPLE For example, December 23, 1998, at 7:52 PM, U.S. Pacific Standard Time, is represented by the string D : 199812231952 - 08' 00

Currently, it's not supported out of box but a simple pre-processing is able to address the need. That's what I am using currently. Wanted to know if this can be supported out of box in the library.

function CDDate(s::String)
    s = ascii(s)
    if startswith(s, "D:")
        s = s[3:end]
    end
    s = *(split(s,'\'')...)
    format = "yyyymmddHHMMSS"
    if endswith(s, 'Z')
        s = s[1:end-1]
    else
        format *= "zzzz"
    end
    CDDate(ZonedDateTime(s, format))
end

sambitdash avatar Aug 12 '17 17:08 sambitdash

As you said this format currently isn't supported out of the box. Right now TimeZones relies heavily on the the DateFormat code provided by Base. I know there has been some talk about revising how the parsing is handled and I'll be sure to keep this format in mind when it comes time to making design choices.

For now you should be able to support this format in your own code similar to what you've been doing. I would recommend doing something like the following:

const CDDATE_REGEX = r"""
    D\s*:\s*
    (?<date>\d{12})\s*
    (?<ut>[+-Z])\s*
    (
        (?<offset_hours>\d{2})'\s*
        (?<offset_minutes>\d{2})
    )?
    """x

function CDDate(str::String)
    m = match(CDDATE_REGEX, str)

    tz = if m[:ut] == "Z"
        FixedTimeZone("UTC")
    else
        FixedTimeZone(m[:ut] * m[:offset_hours] * m[:offset_minutes])
    end

    ZonedDateTime(DateTime(m[:date], dateformat"yyyymmddHHMMSS"), tz)
end

omus avatar Aug 15 '17 04:08 omus

Thanks @omus !!!

sambitdash avatar Aug 15 '17 19:08 sambitdash