vector icon indicating copy to clipboard operation
vector copied to clipboard

Gelf codec encoder send timestamp as integer but graylog expects float

Open stavcev opened this issue 1 year ago • 7 comments

Vector Version

vector 0.26.0 (x86_64-unknown-linux-musl 5d610b6 2022-11-25)

Example Data

For example, my test message:

{
  "host": "host",
  "short_message": "short message",
  "timestamp": "2022-11-25T13:10:56.962200925Z",
  "version": "1.1"
}

Problem

As part of https://github.com/vectordotdev/vector/issues/4868 , PR https://github.com/vectordotdev/vector/pull/13333 submitted GELF encoding support.

I want to send logs contains timestamp field to graylog using encondig.gelf.

To send this timestamp in graylog i need to do one of the following:

  1. Configure encoder parameter in socket sink encoding.timestamp_format: "unix"
  2. Parse timestamp field with vrl:
  transforms:
    gelf:
      type: remap
      inputs: ["kubernetes_logs"]
      source: |-
        .timestamp = to_unix_timestamp!(.timestamp)

In this cases logs sends correctly, but without milliseconds:

  • Stdout: "timestamp": 1669381856,
  • Graylog shows this filed: timestamp: 2022-11-25T13:10:56.000Z

I don't want to lose milliseconds and i tried vrl parsings:

  1. .timestamp = to_unix_timestamp!(.timestamp, unit: "milliseconds") In this case graylog can't recognize and skipping logs.
  2. .timestamp = to_float(to_timestamp!(.timestamp)) - it must be true way, but it fails. In this case i get vector ERRORs: 2022-11-28T10:57:11.468111Z ERROR sink{component_kind="sink" component_id=gelf_out component_type=socket component_name=gelf_out}: vector::internal_events::codecs: Failed serializing frame. error=LogEvent contains a value with an invalid type. field = "timestamp" type = "float" expected type = "timestamp or integer" error_type="encoder_failed" stage="sending" internal_log_rate_limit=true

What's Happening

Gelf docs: https://docs.graylog.org/docs/gelf Quote gelf docs : "timestamp - number with optional decimal places for milliseconds" - I.e float.

Sending to graylog "timestamp": 1669381856.123, - works correctly And graylog shows this field timestamp: 2022-11-25T13:10:56.123Z

But encoder check this field as int: https://github.com/vectordotdev/vector/blob/d2f2ff3a667f7e027754d156fe4b1e369038fdf5/lib/codecs/src/encoding/format/gelf.rs#L178-L181

I think it's necessary to chage is_integer() to is_float() or just add is_float()

Expected Behavior

When i parse timestamp as float encoder.gelf not fails.

stavcev avatar Nov 28 '22 13:11 stavcev

Hi @stavcev !

Thanks for this report. As a workaround you can convert the timestamp to Vector's internal timestamp representation via to_timestamp in your remap transform. This should correctly encode, including the milliseconds.

jszwedko avatar Dec 14 '22 21:12 jszwedko

Hello @jszwedko! This workaround not working for me. My conf:

  transforms:
    gelf:
      type: remap
      inputs: ["kubernetes_logs"]
      source: |-
        .timestamp_from_log = to_string!(.timestamp)
        .timestamp=to_timestamp!(.timestamp)
  sinks:
    gelf_out:
      type: "socket"
      inputs: ["gelf"]
      address: "ip:port"
      mode: "tcp"
      framing:
        method: "character_delimited"
        character_delimited:
          delimiter: "\0"
      encoding:
        codec: 'gelf'

In this case we have different timestamps in graylog:

timestamp: 2023-01-10T13:15:30.092Z timestamp_from_log: 2023-01-10T13:15:30.056055461Z

Graylog show WARNS: WARN [GelfCodec] GELF message <14d1837d-90ec-11ed-98b7-001a4a172605> (received from <*.*.*.*:33470>) has invalid "timestamp": 2023-01-10T13:15:30.056055461Z (type: STRING)

It happening beacause graylog expects timestamp as "number with optional decimal places for milliseconds". If field not number graylog show WARN message and set new timestamp.

stavcev avatar Jan 10 '23 13:01 stavcev

Ah, interesting, thanks @stavcev . It seems like we should be truncating the timestamp to milliseconds then, before sending it to Greylog. As an additional workaround I think you could truncate the string:

to_timestamp!(truncate("2023-01-10T13:15:30.056055461Z", 23) + "Z")

Which would return a timestamp with only millisecond resolution which Greylog should be able to accept.

jszwedko avatar Jan 10 '23 18:01 jszwedko

Hello @jszwedko thank's for fast response. My new config:

  transforms:
    gelf:
      type: remap
      inputs: ["kubernetes_logs"]
      source: |-
        .timestamp_from_log = to_string!(.timestamp)
        .timestamp = to_timestamp!(truncate(to_string!(.timestamp), 23) + "Z")

But result is the same: timestamp: 2023-01-11T08:59:59.417Z timestamp_from_log: 2023-01-11T08:59:53.956031192Z Graylog WARN:

WARN [GelfCodec] GELF message (received from ip:54956) has invalid "timestamp": 2023-01-11T08:59:59.417Z (type: STRING)

stavcev avatar Jan 11 '23 09:01 stavcev

Hrm, that's weird. Does that mean it cannot receive timestamps with milliseconds? Can you give an example of a valid timestamp that is accepted by Graylog?

jszwedko avatar Jan 11 '23 15:01 jszwedko

From my first message: To send this timestamp in graylog i need to do one of the following:

  1. Configure encoder parameter in socket sink encoding.timestamp_format: "unix"
  2. Parse timestamp field with vrl:
  transforms:
    gelf:
      type: remap
      inputs: ["kubernetes_logs"]
      source: |-
        .timestamp = to_unix_timestamp!(.timestamp)

In this cases logs sends correctly, but without milliseconds:

  • Stdout: "timestamp": 1669381856,
  • Graylog shows this filed: timestamp: 2022-11-25T13:10:56.000Z

stavcev avatar Jan 12 '23 09:01 stavcev

Gotcha, I see, thanks for clarifying. I see Graylog is expecting a numeric timestamp. I'd originally thought it wanted a RFC3339 formatted string. As another workaround I think you could do:

.timestamp = to_unix_timestamp!(.timestamp, unit: "milliseconds") / 1000

Which should give you a unix timestamp with fractional seconds up to millisecond precision.

jszwedko avatar Jan 12 '23 14:01 jszwedko

This wo not working too, in this case vector return error: 2023-01-23T10:30:57.741520Z ERROR sink{component_kind="sink" component_id=gelf_out component_type=socket component_name=gelf_out}: vector::internal_events::codecs: Failed serializing frame. error=LogEvent contains a value with an invalid type. field = "timestamp" type = "float" expected type = "timestamp or integer" error_type="encoder_failed" stage="sending" internal_log_rate_limit=true

This behavior was the main reason for creating issue

stavcev avatar Jan 23 '23 10:01 stavcev

I am having the exact same issue and spent some time digging into the code tracking the behavior down to this line: https://github.com/vectordotdev/vector/blob/0fb8d7dd8ec61e53eba4687a487bd544aedb922f/src/codecs/encoding/transformer.rs#L153-L155

I would like to work around that by introducing a new configurable timestamp format of unix_millis and could try to prepare a pull request. Would that be a viable option?

scMarkus avatar Feb 24 '23 17:02 scMarkus

As a temporary workaround one could add an additional field in Graylog and deleting the original timestamp field by using the following VRL code in a remap transformer:

.timestamp_nano = format_timestamp!(.timestamp, format: "%F %T.%f")
del(.timestamp)

which looks like 2023-02-26 14:09:22.703589853 as a string for the timestamp_nano in Graylog and adds the common timestamp field on the server side in millisecond precession.

Anyway I still would highly appreciate a dedicated handling of source timestamp information for the Graylog encoding and I would invest some time in writing code if so desired? if someone could point me to where the config parsing is handled in the code base that would be awesome.

scMarkus avatar Feb 26 '23 14:02 scMarkus

@jszwedko what are your thoughts on this one?

scMarkus avatar Mar 01 '23 18:03 scMarkus

@jszwedko what are your thoughts on this one?

Hi @scMarkus ! I'll take a look at this today and get a response to you. Thanks for taking the time to dive into it~

neuronull avatar Mar 07 '23 16:03 neuronull

I would like to work around that by introducing a new configurable timestamp format of unix_millis and could try to prepare a pull request. Would that be a viable option?

This seems reasonable to me. Adding a new enum variant UnixMillis to TimestampFormat.

if someone could point me to where the config parsing is handled in the code base that would be awesome.

I think you would just need to edit the transformer.rs file you already found. The config parsing is done automatigically more or less.

So just adding the new enum member, and handling it in the places where the enum is handled, and adding unit tests for it should be sufficient.

neuronull avatar Mar 07 '23 18:03 neuronull

The other aspect of this is that in the gelf encoder , we will also need to adapt that to hande / output a float.

neuronull avatar Mar 07 '23 18:03 neuronull

Thanks for your response @neuronull. After tinkering around a bit a actually would like to suggest another approach to the issue.

The gelf documentation explicitly states that timestamp field always is a number. Therefore it makes sense to me to enforce this within the gelf encoder itself for all messages. Using Rfc3339 for gelf does not work anyway. We could do this because the implementation of the Rfc3339 in the discussed transformer (which is the default) does not change the timestamp field at all (which may be coerced by gelf encoder beforehand). https://github.com/vectordotdev/vector/blob/0fb8d7dd8ec61e53eba4687a487bd544aedb922f/src/codecs/encoding/transformer.rs#L172-L173

This would not be a breaking change for people having configured the unix encoding right now but would enable new users to omit the sinks.*.encoding.timestamp_format config and get the desired millisecond behavior when using the gelf encoder.

I have validated this working for me already.

What do you think?

scMarkus avatar Mar 08 '23 17:03 scMarkus

I took a closer look at the current implementation in Vector, and I do see we are not currently enforcing the timestamp to be a number. This is contrary to the original objective of that encoder:

https://github.com/vectordotdev/vector/blob/2f6b6d2a0f703525cd7d97e29f3b009c11abd50f/lib/codecs/src/encoding/format/gelf.rs#L14-L20

, which was supposed to be more strictly adhering to the GELF spec than Gaylog itself. Hence, I suspect but I haven't tested against graylog (also observing that there aren't any unit tests for encoding with timestamp) , that the Rfc3339 string timestamp that is potentially output from the vector gelf encoder is not rejected by the graylog server, being that I did notice their adherence to their spec was not strict, so that would not surprise me.

So, I do agree with your new suggestion and think it's more consistent with the rest of this encoder.

Therefore it makes sense to me to enforce this within the gelf encoder itself for all messages

just to add clarity here- I think it makes sense for us to convert the DateTime timestamp format of log events into numeric, and that would essentially be the reverse of what we do in the Gelf decoder

https://github.com/vectordotdev/vector/blob/2f6b6d2a0f703525cd7d97e29f3b009c11abd50f/lib/codecs/src/decoding/format/gelf.rs#L101-L110)

, we output a float if there is millisecond granularity to the timestamp, but otherwise output an integer (as opposed to always outputting a float). And not to enforce (error out) if the input timestamp is not in integer/float form.

neuronull avatar Mar 09 '23 00:03 neuronull