beats
beats copied to clipboard
Filestream `include_message` do not correctly track the offset of a file
For confirmed bugs, please report:
- Version:
main
- Operating System: All
When the include_message
field is used, the offset is only updated based on the length of the message read, it ignores the amount of data read and not ingested.
How to reproduce
- Create a file (
/tmp/foo.log
) with the following content:TEST A
- Create a
filebeat.yml
with the following content:filebeat.inputs: - type: filestream parsers: - include_message.patterns: - ^A$ id: my-filestream-id enabled: true paths: - /tmp/foo.log output: console: codec.json: pretty: true
- Run Filebeat and wait for the event to be printed in the console
- Stop Filebeat
- Look a the registry log file, the offset will be 2, corresponding to the size of the message, not the bytes advanced in the file.
{"k":"filestream::my-filestream-id::native::26550-34","v":{"cursor":{"offset":2},"meta":{"source":"/tmp/foo.log","identifier_name":"native"},"ttl":1800000000000,"updated":[280445103831836,1716310115]}}
- Stop Filebeat
- Start Filebeat
- Wait until the same message gets published/printed to the console
- Look at the registry file once more, the offset has been increased by 2, for a total of 4.
{"k":"filestream::my-filestream-id::native::26550-34","v":{"cursor":{"offset":4},"meta":{"source":"/tmp/foo.log","identifier_name":"native"},"ttl":1800000000000,"updated":[280444802347591,1716310433]}}
The problem happens because the parser.FilterParser
does not account for the size of the lines it discards.
https://github.com/elastic/beats/blob/a6aa347eeeb9a00524044036157788ce9eb3d0f0/libbeat/reader/filter/filter.go#L58-L70
Then when Filestream gets the message, it increases the file offset by the message's size instead of the amount of bytes advanced in the file. https://github.com/elastic/beats/blob/a6aa347eeeb9a00524044036157788ce9eb3d0f0/filebeat/input/filestream/input.go#L356-L372