pyinfra
pyinfra copied to clipboard
host.data is shared across all hosts
Describe the bug
host.data
seems to be shared across all hosts despite the fact that it is in the host object.
I ran this simple test because I was observing some weird behavior when running operations across multiple hosts, having operations marked as skipped randomly on some hosts despite the fact that there was exactly the same number of operations and command to run on all hosts.
To Reproduce
group_data/all.py
test = {}
inventories/mine.py
# your inventory with several hosts there
tasks/share.py
import os
from pyinfra import host
host.data.test[f"{host}"] = os.getpid()
print(f"{host} {host.data.test}")
Then run
pyinfra inventories/mine.py tasks/share.py
Result:
--> Loading config...
--> Loading inventory...
--> Connecting to hosts...
No host key for host5 found in known_hosts
No host key for host36 found in known_hosts
No host key for host35 found in known_hosts
No host key for host3 found in known_hosts
No host key for host20 found in known_hosts
No host key for host18 found in known_hosts
No host key for host6 found in known_hosts
No host key for host27 found in known_hosts
No host key for host26 found in known_hosts
No host key for host25 found in known_hosts
No host key for host24 found in known_hosts
No host key for host23 found in known_hosts
No host key for host22 found in known_hosts
No host key for host21 found in known_hosts
No host key for host19 found in known_hosts
No host key for host17 found in known_hosts
No host key for host394 found in known_hosts
[host3] Connected
[host20] Connected
[host35] Connected
[host36] Connected
[host5] Connected
[host27] Connected
[host26] Connected
[host18] Connected
[host6] Connected
[host25] Connected
[host24] Connected
[host21] Connected
[host23] Connected
[host22] Connected
[host19] Connected
[host394] Connected
[host17] Connected
--> Preparing operations...
Loading: tasks/share.py
host22 {'host22': 10413}
[host22] Ready: tasks/share.py
host36 {'host22': 10413, 'host36': 10413}
[host36] Ready: tasks/share.py
host24 {'host22': 10413, 'host36': 10413, 'host24': 10413}
[host24] Ready: tasks/share.py
host394 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413}
[host394] Ready: tasks/share.py
host5 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413}
[host5] Ready: tasks/share.py
host26 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413}
[host26] Ready: tasks/share.py
host6 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413}
[host6] Ready: tasks/share.py
host3 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413}
[host3] Ready: tasks/share.py
host19 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413, 'host19': 10413}
[host19] Ready: tasks/share.py
host21 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413, 'host19': 10413, 'host21': 10413}
[host21] Ready: tasks/share.py
host35 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413, 'host19': 10413, 'host21': 10413, 'host35': 10413}
[host35] Ready: tasks/share.py
host23 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413, 'host19': 10413, 'host21': 10413, 'host35': 10413, 'host23': 10413}
[host23] Ready: tasks/share.py
host17 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413, 'host19': 10413, 'host21': 10413, 'host35': 10413, 'host23': 10413, 'host17': 10413}
[host17] Ready: tasks/share.py
host25 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413, 'host19': 10413, 'host21': 10413, 'host35': 10413, 'host23': 10413, 'host17': 10413, 'host25': 10413}
[host25] Ready: tasks/share.py
host18 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413, 'host19': 10413, 'host21': 10413, 'host35': 10413, 'host23': 10413, 'host17': 10413, 'host25': 10413, 'host18': 10413}
[host18] Ready: tasks/share.py
host27 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413, 'host19': 10413, 'host21': 10413, 'host35': 10413, 'host23': 10413, 'host17': 10413, 'host25': 10413, 'host18': 10413, 'host27': 10413}
[host27] Ready: tasks/share.py
host20 {'host22': 10413, 'host36': 10413, 'host24': 10413, 'host394': 10413, 'host5': 10413, 'host26': 10413, 'host6': 10413, 'host3': 10413, 'host19': 10413, 'host21': 10413, 'host35': 10413, 'host23': 10413, 'host17': 10413, 'host25': 10413, 'host18': 10413, 'host27': 10413, 'host20': 10413}
[host20] Ready: tasks/share.py
--> Proposed changes:
Groups: ** / **
[host22] Operations: 0 Commands: 0
[host36] Operations: 0 Commands: 0
[host24] Operations: 0 Commands: 0
[host26] Operations: 0 Commands: 0
[host19] Operations: 0 Commands: 0
[host21] Operations: 0 Commands: 0
[host35] Operations: 0 Commands: 0
[host23] Operations: 0 Commands: 0
[host25] Operations: 0 Commands: 0
[host18] Operations: 0 Commands: 0
[host27] Operations: 0 Commands: 0
[host20] Operations: 0 Commands: 0
Groups: ** / **
[host394] Operations: 0 Commands: 0
[host17] Operations: 0 Commands: 0
Groups: ** / **
[host5] Operations: 0 Commands: 0
[host6] Operations: 0 Commands: 0
[host3] Operations: 0 Commands: 0
--> Beginning operation run...
--> Results:
Groups: ** / **
[host22] Successful: 0 Errors: 0 Commands: 0/0
[host36] Successful: 0 Errors: 0 Commands: 0/0
[host24] Successful: 0 Errors: 0 Commands: 0/0
[host26] Successful: 0 Errors: 0 Commands: 0/0
[host19] Successful: 0 Errors: 0 Commands: 0/0
[host21] Successful: 0 Errors: 0 Commands: 0/0
[host35] Successful: 0 Errors: 0 Commands: 0/0
[host23] Successful: 0 Errors: 0 Commands: 0/0
[host25] Successful: 0 Errors: 0 Commands: 0/0
[host18] Successful: 0 Errors: 0 Commands: 0/0
[host27] Successful: 0 Errors: 0 Commands: 0/0
[host20] Successful: 0 Errors: 0 Commands: 0/0
Groups: ** / **
[host394] Successful: 0 Errors: 0 Commands: 0/0
[host17] Successful: 0 Errors: 0 Commands: 0/0
Groups: ** / **
[host5] Successful: 0 Errors: 0 Commands: 0/0
[host6] Successful: 0 Errors: 0 Commands: 0/0
[host3] Successful: 0 Errors: 0 Commands: 0/0
Expected behavior
I would expect there is a unique instance of the host object and there is no sharing between hosts
--> Preparing operations...
Loading: tasks/share.py
host22 {'host22': 10413}
[host22] Ready: tasks/share.py
host36 {'host36': 10413}
[host36] Ready: tasks/share.py
host24 {'host24': 10413}
[host24] Ready: tasks/share.py
host394 {'host394': 10413}
[host394] Ready: tasks/share.py
host5 {'host5': 10413}
[host5] Ready: tasks/share.py
host26 {'host26': 10413}
[host26] Ready: tasks/share.py
host6 {'host6': 10413}
[host6] Ready: tasks/share.py
host3 {'host3': 10413}
[host3] Ready: tasks/share.py
host19 {'host19': 10413}
[host19] Ready: tasks/share.py
host21 {'host21': 10413}
[host21] Ready: tasks/share.py
host35 {'host35': 10413}
[host35] Ready: tasks/share.py
host23 {'host23': 10413}
[host23] Ready: tasks/share.py
host17 {'host17': 10413}
[host17] Ready: tasks/share.py
host25 {'host25': 10413}
[host25] Ready: tasks/share.py
host18 {'host18': 10413}
[host18] Ready: tasks/share.py
host27 {'host27': 10413}
[host27] Ready: tasks/share.py
host20 {'host20': 10413}
[host20] Ready: tasks/share.py
Meta
- Include output of
pyinfra --support
.
--> Support information:
If you are having issues with pyinfra or wish to make feature requests, please
check out the GitHub issues at https://github.com/Fizzadar/pyinfra/issues .
When adding an issue, be sure to include the following:
System: Linux
Platform: Linux-3.10.0-1160.59.1.el7.x86_64-x86_64-with-glibc2.17
Release: 3.10.0-1160.59.1.el7.x86_64
Machine: x86_64
pyinfra: v2.0.2
Executable: .local/bin/pyinfra
Python: 3.9.4 (CPython, GCC 4.8.5 20150623 (Red Hat 4.8.5-44))
- How was pyinfra installed (source/pip)? pip
- Include pyinfra-debug.log (if one was created)
- Consider including output with
-vv
and--debug
.
This is currently the expected behavior - data defined in the group files is shared (as in the variables themselves) between all the hosts. Using host data would be appropriate to keep it scoped to a single host.
Is this the right behavior? I think so, maybe? It definitely needs documenting either way!
There is no advantages in sharing it. If executions of the operations happens in parallel on all hosts, and if those variables can be written to, then the result is going to be unpredictable. Python already natively provides way to share variables across all operations, so there is no need to create a mechanism for it.
What is missing however, is a way to have variables scoped at host level. It is going to be confusing to have to take into account other hosts executions when writing operations, all variables will have to be dictionaries indexed on the host or they will risk to be shared.
Also, declaring a variable in a host that overrides a value in a group will now also change the behavior of the variable and not only its value. It would be more natural that the location of the variable declaration does not affect the behavior.
This is currently the expected behavior - data defined in the group files is shared (as in the variables themselves)
Note that the end result will be different if we are dealing with immutable variable like integer or string, or mutable like list or dict. Especially, setting an immutable variable on one host will not show the new value on another host, even if that variable is declared in the group file.
So it behaves as if a function was called with the host data passed in parameter, but not as if it was the same variable.
I agree that there is no sensible use-case for this - I actually think group data should be immutable once loaded. If needed it's possible to set override data like host.data.X = "y"
without touching the original values.
So I'd suggest just making all initially loaded inventory data (both host & group) immutable.
Having a mutable storage per host would, in my opinion, cover all cases.
- If you do not need to change this data, it does not matter whether it is mutable or not.
host.data
is fine for that. - If you do need mutable data common to all hosts, you can simply import a python module that contains this data. No need for pyinfra support here.
- We are left with the case where someone need mutable data per host, and the solution today is to like case 2. but with a dictionary indexed on host name. Which is OK workaround, it just cannot be in
host.data
because it is totally unexpected thathost.data
is shared.
In my case, some change are not observable on the host (or very difficult to observe). So I cannot run a get_fact and hope to get the current state of the host to know if an operation needs to run or not.
I have two tasks A and B, B depends on A. I cannot observe easily if any change done during the task A would have an effect on task B. So I consider that any operation that change something may potentially have an effect and I need to perform all operations of task B again. For that reason, I need to save somewhere that task A changed, and take that into account during task B to run the operations. And this is saved per host.
Seems reasonable to me, definitely host.data
should always be host-scoped, so let's start with that. I agree this as a bug because I think this behaviour is unexpected and should not be relied upon.