beats icon indicating copy to clipboard operation
beats copied to clipboard

Use fingerprint file identity by default and migrate all existing filestream inputs to it

Open belimawr opened this issue 1 year ago • 8 comments

We started seeing cases where the filestream-monitoring will log an entry stating the Elastic-Agent log file was truncated:

{
  "log.level": "info",
  "@timestamp": "2024-07-05T15:42:27.135Z",
  "message": "File was truncated. Reading file from offset 0. Path=/var/lib/elastic-agent/data/elastic-agent-8.14.1-1348b9/logs/elastic-agent-20240705-25.ndjson",
  "component": {
    "binary": "filebeat",
    "dataset": "elastic_agent.filebeat",
    "id": "filestream-monitoring",
    "type": "filestream"
  },
  "log": {
    "source": "filestream-monitoring"
  },
  "service.name": "filebeat",
  "id": "filestream-monitoring-agent",
  "path": "/var/lib/elastic-agent/data/elastic-agent-8.14.1-1348b9/logs/elastic-agent-20240705-25.ndjson",
  "ecs.version": "1.6.0",
  "log.logger": "input.filestream",
  "log.origin": {
    "file.line": 300,
    "file.name": "filestream/input.go",
    "function": "github.com/elastic/beats/v7/filebeat/input/filestream.(*filestream).openFile"
  },
  "source_file": "filestream::filestream-monitoring-agent::native::524773-66308",
  "state-id": "native::524423-66428"
}

We know the Elastic-Agent does not truncate files, the only reason for Filebeat to detect a truncation is if inodes are re-used.

The best way to prevent inode reuse from affecting the monitoring logs is to use a file identity that won't be re-used, the best option for that is the fingerprint file identity.

We should either start using the fingerprint file identity or at least make it configurable so users facing this issue can work around the inode reuse.

One very important thing to bear in mind is that if we change the file identity, all existing files will be considered new and re-ingested.

belimawr avatar Jul 08 '24 20:07 belimawr

Is there a reason we can't change the file identity starting from a specific point in the registry at a conceptual level?

Can the file identity used for a registry entry be included in the entry itself, so we can deal with a transition period? That is the configured file identity becomes used for new events, but we can still deal with old events?

This would let us switch to fingerprint everywhere without concern about data duplication.

CC @rdner

cmacknz avatar Jul 09 '24 17:07 cmacknz

Is there a reason we can't change the file identity starting from a specific point in the registry at a conceptual level?

At a conceptual level, no there isn't a reason.

The file identity is part of the key we use in the registry, we have some abstractions to generate the key from a "source", in simple terms it concatenates: input type, input ID, file identity and the actual identifier for the file, e.g: filestream::filestream-monitoring-agent::native::524042-42000. The parts of the code responsible for generating registry key and fetching data from the registry are not really connected. When fetching an entry from the registry we need to know the file identity that was used to generate the key.

Because the input type and the input ID works as a sort of namespace in the registry, an input can fetch all of its entries, check whether the file identity is different, if it is we open the file, calculate the new file identity and save it in the registry as a new entry. This will even work for the autodiscover case where a input instance is created for each file.

After the migration we can proceed with a normal execution.

Can the file identity used for a registry entry be included in the entry itself, so we can deal with a transition period?

Not without re-writing the whole registry and interactions with it, but I believe the migration I mentioned above will work well.

belimawr avatar Jul 09 '24 20:07 belimawr

Because the input type and the input ID works as a sort of namespace in the registry, an input can fetch all of its entries, check whether the file identity is different, if it is we open the file, calculate the new file identity and save it in the registry as a new entry. This will even work for the autodiscover case where a input instance is created for each file.

@belimawr Please correct me if I'm wrong but the above seems to be the key migration work that needs to be implemented in Beats to resolve this issue? If so, we can transfer this issue to the Beats repository.

ycombinator avatar Jul 10 '24 22:07 ycombinator

I transferred it, and also renamed the issue since I think the value of migrating everything to fingerprint by default is much higher than the original title specific to the filestream-monitoring instance.

cmacknz avatar Jul 11 '24 14:07 cmacknz

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

elasticmachine avatar Jul 11 '24 14:07 elasticmachine

I transferred it, and also renamed the issue since I think the value of migrating everything to fingerprint by default is much higher than the original title specific to the filestream-monitoring instance.

What would be "all filestream inputs" here? All started by the Elastic-Agent? By integrations? Or it becomes more focused on implementing the option to switch file identity without losing the state?

belimawr avatar Jul 11 '24 16:07 belimawr

It means all of them regardless of whether they run in Beats or Elastic Agent, unless there is a reason we think non-fingerprint file identities are superior in some cases.

This would perhaps be better titled, "Use fingerprint file identity by default and migrate all existing filestream inputs to it".

cmacknz avatar Jul 11 '24 17:07 cmacknz

@pierrehilbert I think you can plan this one as soon as possible in order to make it land in 9.0 and give it enough time to soak in test environment.

jlind23 avatar Sep 16 '24 14:09 jlind23

@belimawr looks like this is already under implementation, is there a PR we can link to this issue?

jlind23 avatar Nov 25 '24 16:11 jlind23

@belimawr looks like this is already under implementation, is there a PR we can link to this issue?

Yes, it's in draft because I just pushed code that works, but still needs work and tests. Honestly, I only pushed it to run the whole test suit and see where it breaks.

I'll update the title and add this issue in the related issue section today.

belimawr avatar Nov 25 '24 16:11 belimawr