Performance degradation in resource_api version
Describe the Bug
After testing latest version of firewall module i've noticed that puppet applies catalog significantly longer compared to the previous version.
After a bit of strace it looks like new version makes several iptables-save calls every time it applies each rule.
Old version
~# grep -c "execve(\"/usr/sbin/iptables-save" strace1.out
60
New version
~# grep -c "execve(\"/usr/sbin/iptables-save" strace2.out
296
For some machines catalog application time increased more than 2x (from 50 to 120 seconds, for example).
Expected Behavior
Firewall resource should not make multiple iptables-save calls for each rule.
Environment
- Version [3.1.0 and 8.0.1]
- Platform [Ubuntu 20.04]
Additional Context
It is related to #1100
Has anyone found a workaround or quick hack for this yet. We upgraded to v8.1.4 from v5 and the performance is meaning runs take longer than 1 hour and is unusable.
I've been having a go at this. On our servers that are running openstack we have a bunch of rules that firewall module doesn't manage. For a host with 1118 of these rules it's taking ~7 minutes for a run. If I change the module to add a grep -v in the iptables command this drops to about 45-60 seconds so a huge change.
I have noticed this too while testing the newer version in the last few days. ~~The problem appears far worse on dual stack systems where the runtime is 5 times as slow, on our IPv6 only systems the difference seems negligible. This is only based on a small sample of test systems at present, but I thought it worth noting here.~~
Edit: apologies, posted a bit hastily perhaps, I think this was just due to an additional set of ip4 rules on those hosts (over and above the duplication you would expect).
No I think it is worse the more rules you have whether that is IPv6 or IPv4. I think it comes down to iterating and processing them all in ruby. Maybe that is really slow (I'm not a ruby dev). I do notice it calls iptables-save a lot. At the start it seems to call it over and over which I can't tell why so I think that is a bug. In debug logs I see
Debug: Creating default schedules
Info: Caching catalog for api1-1.mgmt.rc.nectar.org.au
Debug: Executing: '/etc/puppetlabs/puppet/etckeeper-commit-pre'
Debug: Loaded state in 0.21 seconds
Debug: Executing: 'iptables-save'
Debug: Executing: 'ip6tables-save'
Debug: Executing: 'iptables-save'
Debug: Executing: 'ip6tables-save'
Debug: Executing: 'iptables-save'
Debug: Executing: 'ip6tables-save'
...
Repeated about 50 times
Then later on I see it do it for each defined rule like
Debug: Executing: 'iptables-save'
Debug: Executing: 'ip6tables-save'
Debug: Current State: {:ensure=>"present", :table=>"filter", :protocol=>"IPv4", :line=>"-A INPUT -p tcp -m tcp --dport 8774 -m comment --comment \"100 nova-api\" -j ACCEPT", :chain=>"INPUT", :physdev_is_bridged=>false, :physdev_is_in=>false, :physdev_is_out=>false, :proto=>"tcp", :isfragment=>false, :isfirstfrag=>false, :ishasmorefrags=>false, :islastfrag=>false, :dport=>"8774", :socket=>false, :reap=>false, :rttl=>false, :rsource=>false, :rdest=>false, :jump=>"ACCEPT", :clusterip_new=>false, :queue_bypass=>false, :clamp_mss_to_pmtu=>false, :checksum_fill=>false, :random_fully=>false, :random=>false, :log_uid=>false, :log_tcp_sequence=>false, :log_tcp_options=>false, :log_ip_options=>false, :time_contiguous=>false, :kernel_timezone=>false, :ipvs=>false, :name=>"100 nova-api", :notrack=>false}
Debug: firewall: Checking whether 'ensure' is out of sync
Debug: firewall: Checking whether 'ensure' is out of sync
Debug: firewall: Checking whether 'protocol' is out of sync
Debug: firewall: Checking whether 'table' is out of sync
Debug: firewall: Checking whether 'chain' is out of sync
Debug: firewall: Checking whether 'proto' is out of sync
Debug: firewall: Checking whether 'dport' is out of sync
Debug: firewall: Checking whether 'jump' is out of sync
iptable-save takes about .5 seconds to run so this all adds up too.