pyinfra performance repo questions
I recently made some benchmark tests to compare sakes and Ansibles performance, and came across the pyinfra-performance repo, so decided to throw in pyinfra as well.
The results I got were a bit different from the pyinfra performance repo:
- Ansible ran faster than pyinfra for hosts < 50 for a single shell command, and hosts < 10 for multiple commands
Note, forks wasn't set for Ansible in the pyinfra performance repo, so it was limited to 5, whereas pyinfra ran with many more processes. Perhaps I misconfigured pyinfra/Ansible somehow, the pyinfra implementation is here https://github.com/alajmo/sake-performance/tree/main/pyinfra.
Anyway, I also spotted some different behavior to how Sake/Ansible handles CLI output compared to pyinfra:
- it seems pyinfra output is directed to stderr, so it was a bit difficult to utilize gnutime for benchmarking, for instance:
- sake:
sake run ping -s servers --limit=8 > /dev/nullprints nothing - ansible:
ansible-playbook ping-playbook.yaml -i inventory.py > /dev/nullprints nothing - pyinfra:
pyinfra inventory.py pyinfra-ping.py > /dev/nullprints--> Loading config... --> Loading inventory... --> Connecting to hosts... /bin/zsh: can't open input file: echo 0.0.0.0 | grep -q '^dedi[0-9]\+$' [host_0] Could not connect ([Errno None] Unable to connect to port 10000 on 0.0.0.0) --> pyinfra error: No hosts remaining!
- sake:
- I kept getting the error
/bin/zsh: can't open input file: echo 0.0.0.0 | grep -q '^dedi[0-9]\+$', even though the tasks were running just fine.
Also, I couldn't find any documentation but is this description of how pyinfra works correct:
- Parse config files
- Connect to servers serially and gathers facts
- Do work locally using the facts and figure out which operations to perform on hosts
- Perform operations on hosts in parallel by sending out shell commands
I'm very keen to dig into this further @alajmo! Life is very busy at the moment so there may be a bit of a delay but I will absolutely be investigating this.