beats icon indicating copy to clipboard operation
beats copied to clipboard

Filestream `include_message` do not correctly track the offset of a file

Open belimawr opened this issue 9 months ago • 1 comments

For confirmed bugs, please report:

  • Version: main
  • Operating System: All

When the include_message field is used, the offset is only updated based on the length of the message read, it ignores the amount of data read and not ingested.

How to reproduce

  1. Create a file (/tmp/foo.log) with the following content:
    TEST
    A
    
  2. Create a filebeat.yml with the following content:
    filebeat.inputs:
      - type: filestream
        parsers:
          - include_message.patterns:
              - ^A$
        id: my-filestream-id
        enabled: true
        paths:
          - /tmp/foo.log
    
    output:
      console:
        codec.json:
          pretty: true
    
  3. Run Filebeat and wait for the event to be printed in the console
  4. Stop Filebeat
  5. Look a the registry log file, the offset will be 2, corresponding to the size of the message, not the bytes advanced in the file.
    {"k":"filestream::my-filestream-id::native::26550-34","v":{"cursor":{"offset":2},"meta":{"source":"/tmp/foo.log","identifier_name":"native"},"ttl":1800000000000,"updated":[280445103831836,1716310115]}}
    
  6. Stop Filebeat
  7. Start Filebeat
  8. Wait until the same message gets published/printed to the console
  9. Look at the registry file once more, the offset has been increased by 2, for a total of 4.
    {"k":"filestream::my-filestream-id::native::26550-34","v":{"cursor":{"offset":4},"meta":{"source":"/tmp/foo.log","identifier_name":"native"},"ttl":1800000000000,"updated":[280444802347591,1716310433]}}
    

The problem happens because the parser.FilterParser does not account for the size of the lines it discards. https://github.com/elastic/beats/blob/a6aa347eeeb9a00524044036157788ce9eb3d0f0/libbeat/reader/filter/filter.go#L58-L70

Then when Filestream gets the message, it increases the file offset by the message's size instead of the amount of bytes advanced in the file. https://github.com/elastic/beats/blob/a6aa347eeeb9a00524044036157788ce9eb3d0f0/filebeat/input/filestream/input.go#L356-L372

belimawr avatar May 21 '24 20:05 belimawr