vector icon indicating copy to clipboard operation
vector copied to clipboard

kubernetes_logs source wrongly collects messages from the beginning of the file

Open sinzui opened this issue 2 years ago • 10 comments

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

kubernetes_logs source is configured to tail the end of the log file to prevent it from collecting old messages (read_from: end). But when testing with an isolated service that it has never seen before, it collects messages from the beginning of the file. To ensure Vector thought the log file was "new", we change data_dir to a new empty directory.

After deploying vector, we expected to see messages starting from the moment that vector started. Instead we see messages going back more than day.

Configuration

data_dir: /var/lib/vector-agent
sources:
    container_logs_source:
    type: kubernetes_logs
    auto_partial_merge: true
    # We retain cached metadata for twice the default because the pod can
    # be recreated.
    delay_deletion_ms: 120000
    extra_field_selector: "metadata.namespace=cme-internal-apps"
    exclude_paths_glob_patterns:
        - "**/*.gz"
        - "**/*.tmp"
        - "**/*.log.*"
    # This is 25% of the default because we know that some tenants trigger rapid
    # rolls. Ignore files older than 5 minutes old...but pull new events when
    # written.
    glob_minimum_cooldown_ms: 15000
    ignore_older_secs: 300
    # Read the default 32k per line (vector skips longer messages) and read
    # twice the default bytes to get more  of the message before roll.
    # We still truncate to 10k later.
    max_line_bytes: 32768
    max_read_bytes: 4096
    read_from: end
    self_node_name: "${VECTOR_SELF_NODE_NAME}"
    timezone: local

Version

0.28.1

Debug Output

No response

Example Data

No response

Additional Context

Vector logs that it discovered the log file

2023-06-01T16:08:14.435585Z INFO source{component_kind="source" component_id=container_logs_source component_type=kubernetes_logs component_name=container_logs_source}:file_server: vector::internal_events::file::source: Found new file to watch. file=/var/log/pods/cme-internal-apps_echo-server-cccfc9489-c49mf_26e09362-3a73-4af5-9aae-a9aca984c14f/echo-server/0.log

We see the first message in the echo-server log file is from yesterday.

head -1 0.log
2023-05-30T12:42:55.189666057-04:00 stdout F 10.156.154.140:43268 | GET /ping

And we see this message and many hundred more were also collected:

2023-05-30T12:42:55.189 10.156.154.140:43268 | GET /ping

The log file is a regular file on the filesystem

stat 0.log
  File: ‘0.log’
  Size: 2725064     Blocks: 9344       IO Block: 4096   regular file
Device: 10300h/66304d Inode: 67108944    Links: 1
Access: (0640/-rw-r-----)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-06-01 13:18:05.442032816 -0400
Modify: 2023-06-01 13:18:33.892942790 -0400
Change: 2023-06-01 13:18:33.892942790 -0400
Birth: -

References

No response

sinzui avatar Jun 01 '23 18:06 sinzui

we're experiencing the same issue, even on latest 0.30.0 version

jsokol89 avatar Jun 08 '23 03:06 jsokol89

I also experienced same issue with 0.26.0

seongjae-min avatar Nov 27 '23 07:11 seongjae-min

I'm also experiencing the same issue with the latest Version 0.34.1

knightprog avatar Dec 13 '23 16:12 knightprog

Yes, i also experienced such issue in 0.34.1

kyanite avatar Dec 25 '23 01:12 kyanite

0.36.0 still the same

sadpdtchr avatar Feb 15 '24 22:02 sadpdtchr

Running into this on 0.39.0. It's making swapping out our old pipeline components with Vector fairly treacherous in busy clusters due to huge initial spikes in resource utilization.

davidpellcb avatar Jul 11 '24 16:07 davidpellcb