vector icon indicating copy to clipboard operation
vector copied to clipboard

Loki sink: out-of-order logs might cause loss of other in-order logs?

Open MartinEmrich opened this issue 1 month ago • 3 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I noticed a quite sudden drop in log volume. Looking at both Vector and Loki logs, I notices tons of "400 Bad Request" in the loki sink log of vector, and tons of "too late timestamp" messages in the Loki log.

So for reasons unknown, the Vector tries to deliver some logs with yesterdays timestamp (way beyond expectations for any reordering within vector).

But my impression is: Vector buffers and sends logs in big chunks. I assume Loki will reject such a chunked request if one single log line violates it's timestamp monotony requirements. Now that whole request will be discarded:

2024-06-21T09:05:52.140971Z ERROR sink{component_kind="sink" component_id=loki component_type=loki}:request{request_id=2592}: vector::sinks::util::retries: Non-retriable error; dropping the request. error=Server responded with an error: 400 Bad Request internal_log_rate_limit=true
2024-06-21T09:05:52.141070Z ERROR sink{component_kind="sink" component_id=loki component_type=loki}:request{request_id=2592}: vector_common::internal_event::service: Internal log [Service call failed. No retries or retries exhausted.] has been suppressed 1 times.
2024-06-21T09:05:52.141073Z ERROR sink{component_kind="sink" component_id=loki component_type=loki}:request{request_id=2592}: vector_common::internal_event::service: Service call failed. No retries or retries exhausted. error=Some(ServerError { code: 400 }) request_id=2592 error_type="request_failed" stage="sending" internal_log_rate_limit=true

Would that also drop all other log events contained in that request, not only the streams loki won't accept? Currently it looks that way.

Configuration

sinks:
      loki:
        type: loki
        inputs:
          - input1
          - input2
        encoding:
          codec: json
        out_of_order_action: accept
        labels:
          "*": "{{ loki_labels }}"
          "platform": "{{ platform }}"
        remove_label_fields: true
        endpoint: ...
        batch:
          max_bytes: 1048576
          max_events: 1000
          timeout_secs: 1
        buffer:
          when_full: block
          type: disk
          max_size: 1073741824
        healthcheck:
          enabled: true
          path: ...
          port: ...

Version

timberio/vector:0.39.0-distroless-libc

Debug Output

No response

Example Data

No response

Additional Context

No response

References

Related topic: https://github.com/vectordotdev/vector/issues/5024

MartinEmrich avatar Jun 21 '24 09:06 MartinEmrich