Time4J icon indicating copy to clipboard operation
Time4J copied to clipboard

Support RFC 5322 date-time-syntax

Open MenoData opened this issue 6 years ago • 2 comments

The newer RFC 5322 protocol obsoletes date-times defined in RFC 1123 and RFC 822. It allows "folding white space" and comments inserted by help of parentheses.

WSP = SP / TAB; FWS = ([WSP CRLF] 1WSP) / obs-FWS ; Folding white space comment = "(" *([FWS] ccontent) [FWS] ")"

3.3. Date and Time Specification

Date and time values occur in several header fields. This section specifies the syntax for a full date and time specification. Though folding white space is permitted throughout the date-time specification, it is RECOMMENDED that a single space be used in each place that FWS appears (whether it is required or optional); some older implementations will not interpret longer sequences of folding white space correctly.

In detail:

date-time = [ day-of-week "," ] date time [CFWS]

day-of-week = ([FWS] day-name) / obs-day-of-week

day-name = "Mon" / "Tue" / "Wed" / "Thu" / "Fri" / "Sat" / "Sun"

date = day month year

day = ([FWS] 1*2DIGIT FWS) / obs-day

month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec"

year = (FWS 4*DIGIT FWS) / obs-year

time = time-of-day zone

time-of-day = hour ":" minute [ ":" second ]

hour = 2DIGIT / obs-hour

minute = 2DIGIT / obs-minute

second = 2DIGIT / obs-second

zone = (FWS ( "+" / "-" ) 4DIGIT) / obs-zone

The day is the numeric day of the month. The year is any numeric year 1900 or later.

The time-of-day specifies the number of hours, minutes, and optionally seconds since midnight of the date indicated.

The date and time-of-day SHOULD express local time.

The zone specifies the offset from Coordinated Universal Time (UTC, formerly referred to as "Greenwich Mean Time") that the date and time-of-day represent. The "+" or "-" indicates whether the time-of- day is ahead of (i.e., east of) or behind (i.e., west of) Universal Time. The first two digits indicate the number of hours difference from Universal Time, and the last two digits indicate the number of additional minutes difference from Universal Time. (Hence, +hhmm means +(hh * 60 + mm) minutes, and -hhmm means -(hh * 60 + mm) minutes). The form "+0000" SHOULD be used to indicate a time zone at Universal Time. Though "-0000" also indicates Universal Time, it is used to indicate that the time was generated on a system that may be in a local time zone other than Universal Time and that the date-time contains no information about the local time zone.

A date-time specification MUST be semantically valid. That is, the day-of-week (if included) MUST be the day implied by the date, the numeric day-of-month MUST be between 1 and the number of days allowed for the specified month (in the specified year), the time-of-day MUST be in the range 00:00:00 through 23:59:60 (the number of seconds allowing for a leap second; see [RFC1305]), and the last two digits of the zone MUST be within the range 00 through 59.

About obsolete elements:

4.2. Obsolete Folding White Space

In the obsolete syntax, any amount of folding white space MAY be inserted where the obs-FWS rule is allowed. This creates the possibility of having two consecutive "folds" in a line, and therefore the possibility that a line which makes up a folded header field could be composed entirely of white space.

obs-FWS = 1*WSP (CRLF 1WSP)

4.3. Obsolete Date and Time

The syntax for the obsolete date format allows a 2 digit year in the date field and allows for a list of alphabetic time zone specifiers that were used in earlier versions of this specification. It also permits comments and folding white space between many of the tokens.

obs-day-of-week = [CFWS] day-name [CFWS]

obs-day = [CFWS] 1*2DIGIT [CFWS]

obs-year = [CFWS] 2*DIGIT [CFWS]

obs-hour = [CFWS] 2DIGIT [CFWS]

obs-minute = [CFWS] 2DIGIT [CFWS]

obs-second = [CFWS] 2DIGIT [CFWS]

The zone handling:

obs-zone = "UT" / "GMT" / ; Universal Time ; North American UT ; offsets "EST" / "EDT" / ; Eastern: - 5/ - 4 "CST" / "CDT" / ; Central: - 6/ - 5 "MST" / "MDT" / ; Mountain: - 7/ - 6 "PST" / "PDT" / ; Pacific: - 8/ - 7 %d65-73 / ; Military zones - "A" %d75-90 / ; through "I" and "K" %d97-105 / ; through "Z", both %d107-122 ; upper and lower case

Where a two or three digit year occurs in a date, the year is to be interpreted as follows: If a two digit year is encountered whose value is between 00 and 49, the year is interpreted by adding 2000, ending up with a value between 2000 and 2049. If a two digit year is encountered with a value between 50 and 99, or any three digit year is encountered, the year is interpreted by adding 1900.

In the obsolete time zone, "UT" and "GMT" are indications of "Universal Time" and "Greenwich Mean Time", respectively, and are both semantically identical to "+0000".

The remaining three character zones are the US time zones. The first letter, "E", "C", "M", or "P" stands for "Eastern", "Central", "Mountain", and "Pacific". The second letter is either "S" for "Standard" time, or "D" for "Daylight Savings" (or summer) time. Their interpretations are as follows:

  EDT is semantically equivalent to -0400
  EST is semantically equivalent to -0500
  CDT is semantically equivalent to -0500
  CST is semantically equivalent to -0600
  MDT is semantically equivalent to -0600
  MST is semantically equivalent to -0700
  PDT is semantically equivalent to -0700
  PST is semantically equivalent to -0800

The 1 character military time zones were defined in a non-standard way in [RFC0822] and are therefore unpredictable in their meaning. The original definitions of the military zones "A" through "I" are equivalent to "+0100" through "+0900", respectively; "K", "L", and "M" are equivalent to "+1000", "+1100", and "+1200", respectively; "N" through "Y" are equivalent to "-0100" through "-1200". respectively; and "Z" is equivalent to "+0000". However, because of the error in [RFC0822], they SHOULD all be considered equivalent to "-0000" unless there is out-of-band information confirming their meaning.

Other multi-character (usually between 3 and 5) alphabetic time zones have been used in Internet messages. Any such time zone whose meaning is not known SHOULD be considered equivalent to "-0000" unless there is out-of-band information confirming their meaning.

When implementing this as new constant in the class ChronoFormatter then the other constant representing RFC 1123 should be deprecated.

MenoData avatar Apr 16 '18 12:04 MenoData

A string in the new format might look like:

Fri, 13 Apr 2018 02:26:19 -0700 (PDT)

The trailing part in parentheses is a comment and should be ignored by the parser!

MenoData avatar Apr 16 '18 12:04 MenoData

Will be realized after Time4J-v5.0 in order to avoid any further postpone of the new main release.

MenoData avatar Jun 07 '18 09:06 MenoData