vector
vector copied to clipboard
Gelf codec encoder send timestamp as integer but graylog expects float
Vector Version
vector 0.26.0 (x86_64-unknown-linux-musl 5d610b6 2022-11-25)
Example Data
For example, my test message:
{
"host": "host",
"short_message": "short message",
"timestamp": "2022-11-25T13:10:56.962200925Z",
"version": "1.1"
}
Problem
As part of https://github.com/vectordotdev/vector/issues/4868 , PR https://github.com/vectordotdev/vector/pull/13333 submitted GELF encoding support.
I want to send logs contains timestamp field to graylog using encondig.gelf.
To send this timestamp in graylog i need to do one of the following:
- Configure encoder parameter in socket sink
encoding.timestamp_format: "unix"
- Parse timestamp field with vrl:
transforms:
gelf:
type: remap
inputs: ["kubernetes_logs"]
source: |-
.timestamp = to_unix_timestamp!(.timestamp)
In this cases logs sends correctly, but without milliseconds:
- Stdout:
"timestamp": 1669381856,
- Graylog shows this filed:
timestamp: 2022-11-25T13:10:56.000Z
I don't want to lose milliseconds and i tried vrl parsings:
-
.timestamp = to_unix_timestamp!(.timestamp, unit: "milliseconds")
In this case graylog can't recognize and skipping logs. -
.timestamp = to_float(to_timestamp!(.timestamp))
- it must be true way, but it fails. In this case i get vector ERRORs:2022-11-28T10:57:11.468111Z ERROR sink{component_kind="sink" component_id=gelf_out component_type=socket component_name=gelf_out}: vector::internal_events::codecs: Failed serializing frame. error=LogEvent contains a value with an invalid type. field = "timestamp" type = "float" expected type = "timestamp or integer" error_type="encoder_failed" stage="sending" internal_log_rate_limit=true
What's Happening
Gelf docs: https://docs.graylog.org/docs/gelf Quote gelf docs : "timestamp - number with optional decimal places for milliseconds" - I.e float.
Sending to graylog "timestamp": 1669381856.123,
- works correctly
And graylog shows this field timestamp: 2022-11-25T13:10:56.123Z
But encoder check this field as int: https://github.com/vectordotdev/vector/blob/d2f2ff3a667f7e027754d156fe4b1e369038fdf5/lib/codecs/src/encoding/format/gelf.rs#L178-L181
I think it's necessary to chage is_integer() to is_float() or just add is_float()
Expected Behavior
When i parse timestamp as float encoder.gelf not fails.
Hi @stavcev !
Thanks for this report. As a workaround you can convert the timestamp to Vector's internal timestamp representation via to_timestamp
in your remap
transform. This should correctly encode, including the milliseconds.
Hello @jszwedko! This workaround not working for me. My conf:
transforms:
gelf:
type: remap
inputs: ["kubernetes_logs"]
source: |-
.timestamp_from_log = to_string!(.timestamp)
.timestamp=to_timestamp!(.timestamp)
sinks:
gelf_out:
type: "socket"
inputs: ["gelf"]
address: "ip:port"
mode: "tcp"
framing:
method: "character_delimited"
character_delimited:
delimiter: "\0"
encoding:
codec: 'gelf'
In this case we have different timestamps in graylog:
timestamp: 2023-01-10T13:15:30.092Z timestamp_from_log: 2023-01-10T13:15:30.056055461Z
Graylog show WARNS:
WARN [GelfCodec] GELF message <14d1837d-90ec-11ed-98b7-001a4a172605> (received from <*.*.*.*:33470>) has invalid "timestamp": 2023-01-10T13:15:30.056055461Z (type: STRING)
It happening beacause graylog expects timestamp as "number with optional decimal places for milliseconds". If field not number graylog show WARN message and set new timestamp.
Ah, interesting, thanks @stavcev . It seems like we should be truncating the timestamp to milliseconds then, before sending it to Greylog. As an additional workaround I think you could truncate the string:
to_timestamp!(truncate("2023-01-10T13:15:30.056055461Z", 23) + "Z")
Which would return a timestamp with only millisecond resolution which Greylog should be able to accept.
Hello @jszwedko thank's for fast response. My new config:
transforms:
gelf:
type: remap
inputs: ["kubernetes_logs"]
source: |-
.timestamp_from_log = to_string!(.timestamp)
.timestamp = to_timestamp!(truncate(to_string!(.timestamp), 23) + "Z")
But result is the same: timestamp: 2023-01-11T08:59:59.417Z timestamp_from_log: 2023-01-11T08:59:53.956031192Z Graylog WARN:
WARN [GelfCodec] GELF message
(received from ip:54956) has invalid "timestamp": 2023-01-11T08:59:59.417Z (type: STRING)
Hrm, that's weird. Does that mean it cannot receive timestamps with milliseconds? Can you give an example of a valid timestamp that is accepted by Graylog?
From my first message: To send this timestamp in graylog i need to do one of the following:
- Configure encoder parameter in socket sink
encoding.timestamp_format: "unix"
- Parse timestamp field with vrl:
transforms:
gelf:
type: remap
inputs: ["kubernetes_logs"]
source: |-
.timestamp = to_unix_timestamp!(.timestamp)
In this cases logs sends correctly, but without milliseconds:
- Stdout:
"timestamp": 1669381856,
- Graylog shows this filed:
timestamp: 2022-11-25T13:10:56.000Z
Gotcha, I see, thanks for clarifying. I see Graylog is expecting a numeric timestamp. I'd originally thought it wanted a RFC3339 formatted string. As another workaround I think you could do:
.timestamp = to_unix_timestamp!(.timestamp, unit: "milliseconds") / 1000
Which should give you a unix timestamp with fractional seconds up to millisecond precision.
This wo not working too, in this case vector return error:
2023-01-23T10:30:57.741520Z ERROR sink{component_kind="sink" component_id=gelf_out component_type=socket component_name=gelf_out}: vector::internal_events::codecs: Failed serializing frame. error=LogEvent contains a value with an invalid type. field = "timestamp" type = "float" expected type = "timestamp or integer" error_type="encoder_failed" stage="sending" internal_log_rate_limit=true
This behavior was the main reason for creating issue
I am having the exact same issue and spent some time digging into the code tracking the behavior down to this line: https://github.com/vectordotdev/vector/blob/0fb8d7dd8ec61e53eba4687a487bd544aedb922f/src/codecs/encoding/transformer.rs#L153-L155
I would like to work around that by introducing a new configurable timestamp format of unix_millis
and could try to prepare a pull request. Would that be a viable option?
As a temporary workaround one could add an additional field in Graylog and deleting the original timestamp field by using the following VRL code in a remap transformer:
.timestamp_nano = format_timestamp!(.timestamp, format: "%F %T.%f")
del(.timestamp)
which looks like 2023-02-26 14:09:22.703589853
as a string for the timestamp_nano in Graylog and adds the common timestamp field on the server side in millisecond precession.
Anyway I still would highly appreciate a dedicated handling of source timestamp information for the Graylog encoding and I would invest some time in writing code if so desired? if someone could point me to where the config parsing is handled in the code base that would be awesome.
@jszwedko what are your thoughts on this one?
@jszwedko what are your thoughts on this one?
Hi @scMarkus ! I'll take a look at this today and get a response to you. Thanks for taking the time to dive into it~
I would like to work around that by introducing a new configurable timestamp format of unix_millis and could try to prepare a pull request. Would that be a viable option?
This seems reasonable to me. Adding a new enum variant UnixMillis
to TimestampFormat
.
if someone could point me to where the config parsing is handled in the code base that would be awesome.
I think you would just need to edit the transformer.rs
file you already found.
The config parsing is done automatigically more or less.
So just adding the new enum member, and handling it in the places where the enum is handled, and adding unit tests for it should be sufficient.
The other aspect of this is that in the gelf
encoder , we will also need to adapt that to hande / output a float.
Thanks for your response @neuronull. After tinkering around a bit a actually would like to suggest another approach to the issue.
The gelf documentation explicitly states that timestamp
field always is a number. Therefore it makes sense to me to enforce this within the gelf encoder itself for all messages. Using Rfc3339
for gelf does not work anyway. We could do this because the implementation of the Rfc3339
in the discussed transformer (which is the default) does not change the timestamp field at all (which may be coerced by gelf encoder beforehand).
https://github.com/vectordotdev/vector/blob/0fb8d7dd8ec61e53eba4687a487bd544aedb922f/src/codecs/encoding/transformer.rs#L172-L173
This would not be a breaking change for people having configured the unix encoding right now but would enable new users to omit the sinks.*.encoding.timestamp_format
config and get the desired millisecond behavior when using the gelf encoder.
I have validated this working for me already.
What do you think?
I took a closer look at the current implementation in Vector, and I do see we are not currently enforcing the timestamp to be a number. This is contrary to the original objective of that encoder:
https://github.com/vectordotdev/vector/blob/2f6b6d2a0f703525cd7d97e29f3b009c11abd50f/lib/codecs/src/encoding/format/gelf.rs#L14-L20
, which was supposed to be more strictly adhering to the GELF spec than Gaylog itself. Hence, I suspect but I haven't tested against graylog (also observing that there aren't any unit tests for encoding with timestamp) , that the Rfc3339 string timestamp that is potentially output from the vector gelf encoder is not rejected by the graylog server, being that I did notice their adherence to their spec was not strict, so that would not surprise me.
So, I do agree with your new suggestion and think it's more consistent with the rest of this encoder.
Therefore it makes sense to me to enforce this within the gelf encoder itself for all messages
just to add clarity here- I think it makes sense for us to convert the DateTime timestamp format of log events into numeric, and that would essentially be the reverse of what we do in the Gelf decoder
https://github.com/vectordotdev/vector/blob/2f6b6d2a0f703525cd7d97e29f3b009c11abd50f/lib/codecs/src/decoding/format/gelf.rs#L101-L110)
, we output a float if there is millisecond granularity to the timestamp, but otherwise output an integer (as opposed to always outputting a float). And not to enforce (error out) if the input timestamp is not in integer/float form.