liblognorm icon indicating copy to clipboard operation
liblognorm copied to clipboard

Parse Java date format

Open hakman opened this issue 8 years ago • 15 comments

The default Java loggers date format is ISO8601 and looks like this:

2015-11-14 17:10:26,589

Would be nice to either allow RFC5424 dates without the T between date and time or add a new parser for ISO8601.

hakman avatar Nov 24 '15 09:11 hakman

Before implementing a parser, we should check if this can be done with a custom data type. I think this is possible. If not, it's probably better to evaluate why not and fix that.

An issue may be that we can't get hold of the result of multiple subparsers in a custom type as a single value. If so, that's a blocker for now, but also something we should look into in the longer term.

rgerhards avatar Nov 24 '15 10:11 rgerhards

@rgerhards I am shipping logs from quite a few different programs which format dates in a few different ways, here are some that I need to parse (the formats are from Ruby's Time#strftime):

FORMAT                   EXAMPLE
%Y-%m-%dT%H:%M:%S.%L%Z   2016-01-09T18:43:28.942GMT
%y-%m-%d %H:%M:%S        16-01-09 18:43:28
%Y-%m-%d %H:%M:%S,%L     2016-01-09 18:43:28,942
%Y/%m/%d %H:%M:%S %Z     2016/01/09 18:43:28 GMT
%Z %b %d %H:%M:%S        GMT Jan 09 18:43:28

I would like to parse these dates from the logs, and then format them consistently in the output (in my case to Elasticsearch).

You mention a custom data type, could you perhaps give me some pointers as how I could achieve this?

lmars avatar Jan 09 '16 18:01 lmars

this would be a subset of https://github.com/rsyslog/liblognorm/issues/176

davidelang avatar Jan 28 '16 23:01 davidelang

@rgerhards Any plan to adopt this as a type? Same thing could be done with date-rfc3339 ?

mostolog avatar Oct 05 '16 10:10 mostolog

as time permits, not in the next 6 weeks.

2016-10-05 12:47 GMT+02:00 mostolog [email protected]:

@rgerhards https://github.com/rgerhards Any plan to adopt this as a type? Same thing could be done with date-rfc3339 ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/rsyslog/liblognorm/issues/177#issuecomment-251642113, or mute the thread https://github.com/notifications/unsubscribe-auth/ABadi8dfAnozqTZyk71Wn2suUjMxS3eAks5qw4A1gaJpZM4GoNvR .

rgerhards avatar Oct 05 '16 10:10 rgerhards

Do you really live on a 6-week basis? Just starting to play with liblognorm and seems really fast!

mostolog avatar Oct 05 '16 10:10 mostolog

2016-10-05 12:50 GMT+02:00 mostolog [email protected]:

Do you really live on a 6-week basis?

well, I know what's on the plate. But if you prefer: it's a NET date, it might take longer ;-)

Just starting to play with liblognorm and seems really fast!

It is.

Rainer

rgerhards avatar Oct 05 '16 10:10 rgerhards

I've pointed out in the past that there are a LOT of different time formats out there. While we want to support them all, we don't want to end up with a different type for each one.

At some point we need the ability to do a couple of things

  1. have a generic date parser that lets us define how the date is written to this particular log message (ideally using the date % format codes for consistancy with the code that is writing them
  2. have a way to combine data fields that we have parsed separately into a single timestamp field (for example, a lot of microsoft csv files have date and time as separate columns)

If you get some time to think about this problem, consider these needs.

Ideally the java date format would just be a special case of one of this general capability (I agree that it's a common enough case that it's probably worth having it's own type)

David Lang

davidelang avatar Oct 27 '16 15:10 davidelang

Hi David.

Didn't have time to look at this, but I'm wondering if something like rule=:%date:yyyy-MM-dd hh:mm:ss,SSS ZZ% wouldn't suit for all uses cases.

Did you noticed how it looks like a regex!? :stuck_out_tongue_closed_eyes:

mostolog avatar Oct 31 '16 16:10 mostolog

On Mon, 31 Oct 2016, mostolog wrote:

Hi David.

Didn't have time to look at this, but I'm wondering if something like rule=:%date:yyyy-MM-dd hh:mm:ss,SSS ZZ% wouldn't suit for all uses cases.

Did you noticed how it looks like a regex!? :stuck_out_tongue_closed_eyes:

I still won't handle all cases, Microsoft logs are csv with the time in one field and the date in another (not always adjacent) field

We need some way to create a structure and tell rsyslog "take this stuff and turn it into a timestamp"

That way we can extract the various parts of the date from whereever they are in the message and assign them to the appropriate sub-fields and then call a function to convert them to a 'real' time

I was origionally thinking we could get away with something like you are saying (but using the well-known abbreviations for the fields as shown in Lewis' message [1]), and it may still be a good thing to have this in liblognorm, but rsyslog is going to still need the ability to combine data that arrives as separate fields.

This is also needed when trying to extract timestamps from logs in a file, you may need to combine the data in the file with some otherwise 'known' data (say year and timezone) to get a real timestamp.

That said, doing this in liblognorm will probably handle 90%+ of cases.

Thinking about this for liblognorm...

With the v2 syntax, we have a lot of the pieces in place with the ability to define custom types. It would be a mess, but you could write a parser for each of the examples in [1] that extracted them into a {date: {}} structure.

Things that I think should be added to liblognorm to make this work:

  1. manually enumerating all the timezones and months would be ugly, they should be enumerated inside liblognorm (along with the conversion to numeric values)
  2. we then would need a way to say "evaluate as date" that would look at all the parts of that have been filled in and create a date from it. If it's too ugly to do this in liblognorm, we may need to punt to doing it in the caller. I thought I saw something like this in liblognorm in the past (for single variables), but I don't find it now.

(we now have the string class that seems like it satisfies the need to be able to grab digits without them needing to be followed by a space, so we don't need that any longer)

David Lang

[1] FORMAT EXAMPLE %Y-%m-%dT%H:%M:%S.%L%Z 2016-01-09T18:43:28.942GMT %y-%m-%d %H:%M:%S 16-01-09 18:43:28 %Y-%m-%d %H:%M:%S,%L 2016-01-09 18:43:28,942 %Y/%m/%d %H:%M:%S %Z 2016/01/09 18:43:28 GMT %Z %b %d %H:%M:%S GMT Jan 09 18:43:28

davidelang avatar Oct 31 '16 20:10 davidelang

I now understood what you mean.

Would something like:

%my-2-digit-year-field-name:word:YY%, %year4:word:YYYY%

and then using them to compose a date work?

%day%-%month%-%year4:word%

PS: I'll go for:

Z = +0100 (plain text format) ZZ = +02:00 (colon format) ZZZ = GMT, CET, CEST (Timezone) ZZZZ = America/Los_Angeles (Specific TZ identity)

mostolog avatar Nov 02 '16 07:11 mostolog

On Wed, 2 Nov 2016, mostolog wrote:

I now understood what you mean.

Would something like:

%my-2-digit-year-field-name:word:YY%, %year4:word:YYYY%

and then using them to compose a date work?

%day%-%month%-%year%:word%

figuring out how to deal with the % overload is a big problem :-/

I was thinking more like (using rsyslog terms rather than trying to figure a working ruleset syntax to get the idea across)

$!time!dayofmonth = '17' $!time!month = 'Jan' or 'January' or '1' $!time!year = '2016' $!time!hour = '13' $!time!min = '03'

set $!timestamp = mktime($!time)

trying to put this in ruleset format (to parse a traditional timestamp)

type=:mytime:%month:date:m% %day:date:e% %hour:date:H%:%min:date:M%:%sec:date:S% rule=%time:mytime% %hostname:word% ... append=:%time!year%=2016,%mytimestamp%=%time:evaluate:timestamp%

where the things after date: in the type def are the things after % in the man page I quote below

PS: I'll go for:

Z = +0100 (plain text format) ZZ = +02:00 (colon format) ZZZ = GMT, CET, CEST (Timezone) ZZZZ = America/Los_Angeles (Specific TZ identity)

you are looking at it from the point of view of a human reading it with your HHMMSSYYYY type of syntax.

I'm looking at it from the point of view of a programmer, just about every date output library uses something very similar to what you find in 'man date', and frequently the exact format string for the date is available (either in documentation or in the source). So I like the idea of being able to directly copy the format. It's also stood the test of time in being able to handle all the different ways people want to format dates. I believe that there are also things that parse based on these standards.

It seems overly complex to start with, but if you are going to end up doing much of anything with date formatting, you will end up seeing/learning this.

now, a number of these can be eliminated because we can auto-adapt to capitalization, and we have other ways of dealing with literals, but a lot of these are very useful (%e where there can be a space for example)

    FORMAT controls the output.  Interpreted sequences are:
    %%     a literal %
    %a     locale's abbreviated weekday name (e.g., Sun)
    %A     locale's full weekday name (e.g., Sunday)
    %b     locale's abbreviated month name (e.g., Jan)
    %B     locale's full month name (e.g., January)
    %c     locale's date and time (e.g., Thu Mar  3 23:05:25 2005)
    %C     century; like %Y, except omit last two digits (e.g., 20)
    %d     day of month (e.g, 01)
    %D     date; same as %m/%d/%y
    %e     day of month, space padded; same as %_d
    %F     full date; same as %Y-%m-%d
    %g     last two digits of year of ISO week number (see %G)
    %G     year of ISO week number (see %V); normally useful only with %V
    %h     same as %b
    %H     hour (00..23)
    %I     hour (01..12)
    %j     day of year (001..366)
    %k     hour ( 0..23)
    %l     hour ( 1..12)
    %m     month (01..12)
    %M     minute (00..59)
    %n     a newline
    %N     nanoseconds (000000000..999999999)
    %p     locale's equivalent of either AM or PM; blank if not known
    %P     like %p, but lower case
    %r     locale's 12-hour clock time (e.g., 11:11:04 PM)
    %R     24-hour hour and minute; same as %H:%M
    %s     seconds since 1970-01-01 00:00:00 UTC
    %S     second (00..60)
    %t     a tab
    %T     time; same as %H:%M:%S
    %u     day of week (1..7); 1 is Monday
    %U     week number of year, with Sunday as first day of week (00..53)
    %V     ISO week number, with Monday as first day of week (01..53)
    %w     day of week (0..6); 0 is Sunday
    %W     week number of year, with Monday as first day of week (00..53)
    %x     locale's date representation (e.g., 12/31/99)
    %X     locale's time representation (e.g., 23:13:48)
    %y     last two digits of year (00..99)
    %Y     year
    %z     +hhmm numeric timezone (e.g., -0400)
    %:z    +hh:mm numeric timezone (e.g., -04:00)
    %::z   +hh:mm:ss numeric time zone (e.g., -04:00:00)
    %:::z  numeric time zone with : to necessary precision (e.g., -04, 

+05:30) %Z alphabetic time zone abbreviation (e.g., EDT)

davidelang avatar Nov 02 '16 08:11 davidelang

To sum up: I like your proposal. Please, consider days/months in spanish -for example- aren't capitalized.

mostolog avatar Nov 02 '16 08:11 mostolog

On Wed, 2 Nov 2016, David Lang wrote:

figuring out how to deal with the % overload is a big problem :-/

I was thinking more like (using rsyslog terms rather than trying to figure a working ruleset syntax to get the idea across)

$!time!dayofmonth = '17' $!time!month = 'Jan' or 'January' or '1' $!time!year = '2016' $!time!hour = '13' $!time!min = '03'

set $!timestamp = mktime($!time)

trying to put this in ruleset format (to parse a traditional timestamp)

type=:mytime:%month:date:m% %day:date:e% %hour:date:H%:%min:date:M%:%sec:date:S% rule=%time:mytime% %hostname:word% ... append=:%time!year%=2016,%mytimestamp%=%time:evaluate:timestamp%

hmm, thinking about this a bit more, the format data identifies what the data is, so insteadof doing %time!year:date:Y%, we could do just %time:date:Y%, %time:date:m% etc (the date type fills in sub-elements of time instead of creating specific variables)

Then we can also auto-define time!timestamp and have it get filled in at the end of the rule evaluation, using all the elements that have been filled in for the time structure.

David Lang

where the things after date: in the type def are the things after % in the man page I quote below

PS: I'll go for:

Z = +0100 (plain text format) ZZ = +02:00 (colon format) ZZZ = GMT, CET, CEST (Timezone) ZZZZ = America/Los_Angeles (Specific TZ identity)

you are looking at it from the point of view of a human reading it with your HHMMSSYYYY type of syntax.

I'm looking at it from the point of view of a programmer, just about every date output library uses something very similar to what you find in 'man date', and frequently the exact format string for the date is available (either in documentation or in the source). So I like the idea of being able to directly copy the format. It's also stood the test of time in being able to handle all the different ways people want to format dates. I believe that there are also things that parse based on these standards.

It seems overly complex to start with, but if you are going to end up doing much of anything with date formatting, you will end up seeing/learning this.

now, a number of these can be eliminated because we can auto-adapt to capitalization, and we have other ways of dealing with literals, but a lot of these are very useful (%e where there can be a space for example)

  FORMAT controls the output.  Interpreted sequences are:
  %%     a literal %
  %a     locale's abbreviated weekday name (e.g., Sun)
  %A     locale's full weekday name (e.g., Sunday)
  %b     locale's abbreviated month name (e.g., Jan)
  %B     locale's full month name (e.g., January)
  %c     locale's date and time (e.g., Thu Mar  3 23:05:25 2005)
  %C     century; like %Y, except omit last two digits (e.g., 20)
  %d     day of month (e.g, 01)
  %D     date; same as %m/%d/%y
  %e     day of month, space padded; same as %_d
  %F     full date; same as %Y-%m-%d
  %g     last two digits of year of ISO week number (see %G)
  %G     year of ISO week number (see %V); normally useful only with %V
  %h     same as %b
  %H     hour (00..23)
  %I     hour (01..12)
  %j     day of year (001..366)
  %k     hour ( 0..23)
  %l     hour ( 1..12)
  %m     month (01..12)
  %M     minute (00..59)
  %n     a newline
  %N     nanoseconds (000000000..999999999)
  %p     locale's equivalent of either AM or PM; blank if not known
  %P     like %p, but lower case
  %r     locale's 12-hour clock time (e.g., 11:11:04 PM)
  %R     24-hour hour and minute; same as %H:%M
  %s     seconds since 1970-01-01 00:00:00 UTC
  %S     second (00..60)
  %t     a tab
  %T     time; same as %H:%M:%S
  %u     day of week (1..7); 1 is Monday
  %U     week number of year, with Sunday as first day of week (00..53)
  %V     ISO week number, with Monday as first day of week (01..53)
  %w     day of week (0..6); 0 is Sunday
  %W     week number of year, with Monday as first day of week (00..53)
  %x     locale's date representation (e.g., 12/31/99)
  %X     locale's time representation (e.g., 23:13:48)
  %y     last two digits of year (00..99)
  %Y     year
  %z     +hhmm numeric timezone (e.g., -0400)
  %:z    +hh:mm numeric timezone (e.g., -04:00)
  %::z   +hh:mm:ss numeric time zone (e.g., -04:00:00)
  %:::z  numeric time zone with : to necessary precision (e.g., -04, 

+05:30) %Z alphabetic time zone abbreviation (e.g., EDT)

davidelang avatar Nov 02 '16 09:11 davidelang

Hi,

A bit more than 6 weeks later, but stumbled on this trying to find a way to parse datestamp from RabbitMQ - almost rfc5424 but no T between date and time. Is there any work done here or could I get any hints how to parse log file and forward it to journal with proper fields set?

Regards, Marcin.

Garagoth avatar Jul 04 '18 10:07 Garagoth