beats
beats copied to clipboard
[Elastic-Agent] Investigate: Improve host.id fingerprinting for VDI
As a user, I would like to have a uniquely created host.id even when using VM (VDI, virtual desktop infrastructure)
The problem:
The host.id isn't unique enough, we are using elastic/go-sysinfo to generate the host.id from hardware information but if you have two VM from the same images it's possible that the host.id is the same.

Pinging @elastic/ingest-management (Team:Ingest Management)
@ferullo any strategy you were discussing to solve that problem in previous version of endpoint?
The problem generating a host id based on a system property has is that in some copy/cloned VM scenarios the VM has no idea it's been cloned. When that happens there isn't really a way from inside the VM to know the host id needs to be regenerated, and even if it is the same id will be generated.
Ultimately, the implemenation I recommend is to randomly generate a host id and save it on the machine. If the id should be the same across uninstall/reinstall then the id should by left behind after uninstall.
There are different ways to implement Agent knowing when it needs to regenerate the host id. One would be to just require the user to clear the saved host id before creating a VDI snapshot (triggering a regeneration when Agent starts next boot). Another is to let a user mark a host id in Kibana as a VDI id and have Agents automatically regenerate their id when connect next. A third to to automate that so if two Agents connect too Kibana at the same time with the same host id Kibana automatically marks that id as a "vdi host id". I'm sure you can think of other ways too.
TL;DR I don't think you can base the host id on system properties for VDI and I don't think an Agent can know it needs to generate a new ID without being "told" to do so somehow.
This is a bummer, it's sad that we cannot improve fingerprinting. But I guess y ou are correct generate a new ID and save it somewhere and reuse it seems like the way to go.
++ on VDI-safe agents. I'm hearing this ask from a couple of groups.
I'm also seeing host.id collisions in a Linux container fleet running beats. I see the ratio of host.name to host.id ranging between 1042 and 2173 to one. Similar things probably can happen when virtual machines are cloned or started from a base image.
Sometimes this is sorted out with an agent "registration" process where a new agent asks, after authenticating, if it's GUID, like host.id, is unique. If the answer is no, it generates a new GUID and asks again. The odds of collisions between randomly generated GUIDs are low but in really, large complex deployments with distributed clusters using cross-cluster search there might need to be a central registration service or a callback from an ingest node to some central index of host.id values, so we can show the ability exists to detect collisions, if the feature were audited someday. We probably should detect host.id collisions in any event, when we have more uniqueness, as we increase capabilities to spot spoofed documents and / or data.
We'd like to use host.id as a primary key for risk score (a security feature) if we can make it more unique.
Pinging @elastic/agent (Team:Agent)
@MikePaquette - host.id uniqueness issues
Hi! We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!
Was this closed because stale or actually done?
Hi! We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!
This is an issue that continues to be a concern when using Virtual Desktop Infrastructure (VD). Can we get an updated review of this?
Any fixes - still seeing it on version 8.12 cmon Elastic......
Hi! We just realized that we haven't looked into this issue in a while. We're sorry!
We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!
👍
The host.id can also easily be spoofed, e.g. by changing the Windows registry value MachineGuid in HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Cryptography.
We are also impacted by the host.id not being unique. One approach could be that it should not be read from the client machine, but rather created by the managing server (in fleet environments).
It could be generated (pseudo-random) by taking into account the host's hostname and the mac address of the machine. This would solve the case when VDIs have at least one of those two values changed for each instance generated.