beats icon indicating copy to clipboard operation
beats copied to clipboard

Ability to include file inode and deviceid in log meta fields

Open hartfordfive opened this issue 1 year ago • 4 comments

I'be encountered multiple occasions on random hosts where filebeat had inexplicably started to re-consume a log file from the beginning for no apparent reason. For reference, filebeat is configured to publish the collected logs to kafka. I was able to confirm that filebeat is in fact re-consuming the log file by observing the kafka metadata in our logs (includes kf timestamp, topic, and offset) as well as the byte offset included by filebeat. It's also guaranteed that this has not been due to a consumer group offset reset as the duplicate entries had the identical byte offset and our tppic retention period is only 12h. The duplicated log entries were often from logs which were generated weeks or even months in the past.

I realize that a change of inode or deviceid could cause a file to be re-consumed. Even though I am highly doubtful this is the case, I would like to be able to confirm this without a doubt. I realize this could be obtained by consuming the registry file although I'm not convinced that method would be appropriate or work properly. Even if it did, the volume of changes to this file can be very large especially on hosts with thousand of log files to tail. This would essential result in a massive unnecessary increase of logs being shipped and add substantial stress on elasticsearch. As a better option, I propose adding the file inode and deviceid ascmeta fields sent along with each log entry, which could be optionally enabled. I can't see this as having a negative impact on performance and could allow for improved diagnostics for issues such as this one.

hartfordfive avatar Jun 28 '24 12:06 hartfordfive

I realize that a change of inode or deviceid could cause a file to be re-consumed.

Have you considered using the filestream input with the file_identity: fingerprint setting? It was built specifically to address the inode change situation; you can read more about it in this blog post.

ycombinator avatar Jun 28 '24 18:06 ycombinator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Jun 28 '24 18:06 elasticmachine

We are in the process of updating the filebeat configuration across all hosts from the legacy log input to the new filestream input. Once completed, we'll be able to specify the most appropriate fingerprint option depending on the usecase. It would still be beneficial to have the ability to optionally add that information to the log event. In a worst case, having at least the ability to add the resulting fingerprint value should be ok.

hartfordfive avatar Jun 29 '24 13:06 hartfordfive

Wouldn't the work done https://github.com/elastic/beats/pull/36065 solve your issue?

pierrehilbert avatar Jun 30 '24 17:06 pierrehilbert

Actually, it seems so. Does this mean that changing from log input type to filestream will automatically enable those fields given it's filebeat >= 8.10.0?

hartfordfive avatar Jul 16 '24 02:07 hartfordfive

Yes, you will get those fields starting with Filebeat 8.10.0 and only for FIlestream. I'm closing this issue as resolved.

pierrehilbert avatar Jul 16 '24 06:07 pierrehilbert