cylc-flow
cylc-flow copied to clipboard
Host identification
Description below pasted verbatim from Element chat room (@oliver-sanders please edit if desired).
See also comments on #3766 and #3595 which this issue supersedes.
See also:
- #4981
- #3766
- #3595
- #5411
- #4296
TLDR;
hold fire on FQDN/localhost changes for the moment and wait until the dust has settled after the platforms work. Have a tidy up of the hostuserutil and other remote logic to see what's left and find out what niche requirements we might still have, hopefully not many, then work out how best to implement whatever checks we still require.
Full post:
Just had a chat with Dave about FQDN, localhost self-identification, etc relating to #3766 #3595:
At present we rely on the premise that for each host there exists a unique global identifier, its FQDN and that that identifier can be obtained from anywhere on the network. This system is nice and universal so we can use it for all purposes, e.g. comparing localhost to remote hosts saving us from using separate logic for different purposes.
Unfortunately the assumption that an FQDN is a unique global identifier for every host is flawed no matter what method we use to retrieve the FQDN. FQDN and DNS issues have been a consistent source of pain for a long time (I have to apply a patch just to get Cylc to run on my box) in need of a solid solution.
Off the top of my head, we use this FQDN logic for things like:
- Filtering out duplicate hosts.
- Reducing SSH'es by batching them together by hostname.
- Determining whether X is an identifier for localhost (e.g. is this host in the list of condemned hosts or working out whether we need to SSH or not)
Once the platforms work is merged we will have "configured away" the need to compare remote host FQDNs, hopefully completely. The matter of filtering out duplicate hosts from a list is something we could do away with since it is a configuration error not a Cylc problem.
I think (perhaps with a bit of fiddling) we might be left with the problem of localhost self-identification (third bullet point above) which may enable us to ditch FQDN logic completely in favour of a more reliable method.
tldr; So, my suggestion would be to hold fire on FQDN/localhost changes for the moment and wait until the dust has settled after the platforms work. Have a tidy up of the hostuserutil and other remote logic to see what's left and find out what niche requirements we might still have, hopefully not many, then work out how best to implement whatever checks we still require.