logstash
logstash copied to clipboard
Configuration for @timestamp field format
Feature Description
Add configuration (using filter, logstash.yml, or pipeline configuration) to set @timestamp field format.
Why is it Required
Logstash V7.X and below had milliseconds percision for @timestamp field (e.g. 2022-07-31T06:46:06.200Z). Since Logstash 8 the @timestamp field percision was changed to microsecionds percision (e.g. 2022-07-31T06:46:06.200000Z). This change in format may cause external output to not able to process the message due to bad format for @timestamp field. For Example, in Elasticsearch i have the following mapping for many indices:
{
"mappings": {
"properties": {
"@timestamp": {
"type": "date",
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSX"
}
}
}
}
And when trying to insert data with the new format I get the following:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [@timestamp] of type [date] in document with id 'XXX'. Preview of field's value: '2022-07-31T06:46:06.200000Z'"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse field [@timestamp] of type [date] in document with id 'XXX'. Preview of field's value: '2022-07-31T06:46:06.200000Z'",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "failed to parse date field [2022-07-31T06:46:06.200000Z] with format [yyyy-MM-dd'T'HH:mm:ss.SSSX]",
"caused_by" : {
"type" : "date_time_parse_exception",
"reason" : "date_time_parse_exception: Text '2022-07-31T06:46:06.200000Z' could not be parsed at index 23"
}
}
},
"status" : 400
}
Also, I can't change the format as mapping update in Elasticsearch is not possible (and changing index templates and reindexing all the data is not a viable solution).
How to Overcome This
At the moment what i'm able to do on Logstash side (in order to keep my Elasticsearch indices unchanged) is to use the following which copies @timestamp field to temp field, manipulate it to milliseconds format and override @timestamp field:
mutate {
add_field => {
"tmptimestamp" => "%{@timestamp}"
}
}
mutate {
gsub => [
"tmptimestamp", "\d{3}Z$", "Z"
]
}
date {
match => ["tmptimestamp", "yyyy-MM-dd'T'HH:mm:ss.SSSZ"]
}
The above seems quite ineffective and requires many pipelines to change.
Thanks, Ofir
There are several things going on here.
First, as of Logstash 8 we have nano-precision timestamps (not _micro_second, as posited here), which should be serialized using 0,3,6,or 9 decimals. As of 8.4.0, this will have a minimum of 3 decimals (https://github.com/elastic/logstash/pull/14299), in order to promote compatibility with Elasticsearch's date_time
or strict_date_time
formatters which require decimal precision.
I am unsure how you are getting 2022-07-31T06:46:06.200000Z
as an output, since the serialization is supposed to use the minimum number of three-groupings to unambiguously represent the timestamp (I would expect 2022-07-31T06:46:06.200Z
), and this behaviour is defined by the java.time.format.DateTimeFormatter.ISO_INSTANT
implementation of the JDK you are running on. Can you share details of which JDK you have configured, or if you are using the bundled JDK? We include information about this during the startup sequence.
Second, while the mapping of the field's type in Elasticsearch cannot be changed, the formatters used to parse inbound values into the field's type can be changed. Since Elasticsearch will use the first usable formatter for either the parse or print operations, specifying yyyy-MM-dd'T'HH:mm:ss.SSSX||date_time||date_time_no_millis
woulds ensure that the print operation remains the same but the parse operations could "fall through" to a working format handler. After upgrading to Logstash 8.4 which includes the above-mentioned minimum precision, this could be simplified to yyyy-MM-dd'T'HH:mm:ss.SSSX||date_time
.
Logstash version: 8.3.2 OS Java version: java -version:
openjdk version "1.8.0_312"
OpenJDK Runtime Environment (build 1.8.0_312-8u312-b07-0ubuntu1~18.04-b07)
OpenJDK 64-Bit Server VM (build 25.312-b07, mixed mode)
Logstash jdk during startup:
[2022-08-02T06:23:18,617][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"8.3.2", "jruby.version"=>"jruby 9.2.20.1 (2.5.8) 2021-11-30 2a2962fbd1 OpenJDK 64-Bit Server VM 11.0.15+10 on 11.0.15+10 +indy +jit [linux-x86_64]"}
jvm flags:
[2022-08-02T06:23:18,621][INFO ][logstash.runner ] JVM bootstrap flags: [-Xms1g, -Xmx1g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djruby.compile.invokedynamic=true, -Djruby.jit.threshold=0, -XX:+HeapDumpOnOutOfMemoryError, -Djava.security.egd=file:/dev/urandom, -Dlog4j2.isThreadContextMapInheritable=true, -Djruby.regexp.interruptible=true, -Djdk.io.File.enableADS=true, --add-opens=java.base/java.security=ALL-UNNAMED, --add-opens=java.base/java.io=ALL-UNNAMED, --add-opens=java.base/java.nio.channels=ALL-UNNAMED, --add-opens=java.base/sun.nio.ch=ALL-UNNAMED, --add-opens=java.management/sun.management=ALL-UNNAMED]
-
So i guess bundled jdk is being used
-
Regarding updating format on an existing index (not changing type) - i do get error when trying to update:
{
"properties": {
"@timestamp": {
"format": "yyyy-MM-dd'T'HH:mm:ss.SSSX || yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
"type": "date"
}
}
}
I get the following
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Mapper for [@timestamp] conflicts with existing mapper:\n\tCannot update parameter [format] from [yyyy-MM-dd'T'HH:mm:ss.SSSX] to [yyyy-MM-dd'T'HH:mm:ss.SSSX || yyyy-MM-dd'T'HH:mm:ss.SSSSSSX]"
}
],
"type" : "illegal_argument_exception",
"reason" : "Mapper for [@timestamp] conflicts with existing mapper:\n\tCannot update parameter [format] from [yyyy-MM-dd'T'HH:mm:ss.SSSX] to [yyyy-MM-dd'T'HH:mm:ss.SSSX || yyyy-MM-dd'T'HH:mm:ss.SSSSSSX]"
},
"status" : 400
}
The in-flight Logstash 8.4.0 will print a minimum of 3 digits precision (even if that is 000
), which will cause it to work with Elasticsearch's built-in date_time
named format and other formats like yours that require sub-second precision.
Updating your date field's formatting is out-of-scope for this Logstash issue (I'm not an Elasticsearch expert), but:
- I believe that the spaces surrounding the or-clause (
||
) may be problematic, and - the fallback-clause needs to be less precise than the existing clause; the instruction should read:
- parse with sub-second, and if that fails, parse without sub-second
- print with sub-second (and if that fails, which it can't, print without sub-second)
{
"properties": {
"@timestamp": {
- "format": "yyyy-MM-dd'T'HH:mm:ss.SSSX || yyyy-MM-dd'T'HH:mm:ss.SSSSSSX",
- "format": "yyyy-MM-dd'T'HH:mm:ss.SSSX||yyyy-MM-dd'T'HH:mm:ssX",
"type": "date"
}
}
}
Hi,
I've just tested an upgrade from Logstash 7.x to 8.7 and I have this issue. After the upgrade, the "@timestamp" field is now formatted such as "2023-05-05T09:00:27.719610409Z". It's an issue because I have many tools in my processing pipeline which expect this field to be formatted like before : "2023-05-05T09:00:27.719Z". So after the upgrade everything is broken. Is there any workaround to force the format of @timestamp to keep the original 7.x format ? I would like to avoid using a date filter to transform on the fly all the logs going through my pipeline, since it's pretty CPU intensive.