strongswan icon indicating copy to clipboard operation
strongswan copied to clipboard

Stale Block policy in Charon and Kernel SPD.

Open mr-vincy opened this issue 4 years ago • 10 comments
trafficstars

System (please complete the following information):**

  • OS: CentOS 7
  • Kernel version (if applicable): [3.10]
  • strongSwan version(s): [5.6.3]
  • Tested/confirmed with the latest version: [no]

Describe the bug Unable to install IPsec policies (SPD) in kernel, due to stale block policy in SPD.

I have N systems running Strongswan 5.6.3 (Host-A to Host-N and Security-GW), and establishes IPSec SA in tunnel mode, we recently encounter this issue and was seen only 3 times. where the Host-X is unable to install a SA unless the TSi is changed on the Host-X.

To Reproduce Steps to reproduce the behavior:

  1. Host-X establishes IPSec tunnel to Security-GW.
  2. Host-X loses connectivity.
  3. Host-X keeps retrying and reboots if not successfully in establishing the tunnel.
  4. Security-GW terminates the tunnel due to DPD (60 Secs) after 2 retransmissions of DPD request.
  5. Host-X get to successfully authenticates with Security-GW using pubkey, but fails to install out policy, due to presence of stale out block policy for the same TSi-TSr.

Expected behavior Host-X must be able to succefully install all the policy and no stale block policy must be present, across multiple make-break tunnels from any host on Security-GW.

Logs/Backtraces 2021-03-18T09:38:12.036432-07:00 Sec-GW ipsec-charon: 47[IKE-AUDIT(0)] 10.212.200.43 is initiating an IKE_SA 2021-03-18T09:38:12.036732-07:00 Sec-GW ipsec-charon: 47[IKE-CTRL(1)] IKE_SA 17IPsecTunnelServer_ipsec[19401] [952952558d60c5a2_i-cf6f41544586d606_r] between 10.212.202.4[500] to 10.212.200.43[500] state change: CREATED => CONNECTING 2021-03-18T09:38:12.052549-07:00 Sec-GW ipsec-charon: 47[IKE-CTRL(1)] remote host is behind NAT 2021-03-18T09:38:12.375652-07:00 Sec-GW ipsec-charon: 63[IKE-CTRL(1)] received end entity cert "C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=40000870.host.net" 2021-03-18T09:38:12.375936-07:00 Sec-GW ipsec-charon: 63[CFG-CTRL(1)] looking for peer configs matching 10.212.202.4[%any]...10.212.200.43[C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=40000870.host.net] 2021-03-18T09:38:12.377273-07:00 Sec-GW ipsec-charon: 63[CFG-CTRL(1)] using certificate "C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=40000870.host.net" 2021-03-18T09:38:12.377541-07:00 Sec-GW ipsec-charon: 63[CFG-CTRL(1)] using trusted ca certificate "C=US, ST=New Hampshire, OU=Engineering, CN=ca.host.net, [email protected]" 2021-03-18T09:38:12.377805-07:00 Sec-GW ipsec-charon: 63[CFG-CTRL(1)] checking certificate status of "C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=40000870.host.net" 2021-03-18T09:38:12.378079-07:00 Sec-GW ipsec-charon: 63[CFG-CTRL(1)] certificate status is not available 2021-03-18T09:38:12.378342-07:00 Sec-GW ipsec-charon: 63[CFG-CTRL(1)] reached self-signed root ca with a path length of 0 2021-03-18T09:38:12.378607-07:00 Sec-GW ipsec-charon: 63[IKE-CTRL(1)] authentication of 'C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=40000870.host.net' with RSA_EMSA_PKCS1_SHA2_384 successful 2021-03-18T09:38:12.391341-07:00 Sec-GW ipsec-charon: 63[IKE-CTRL(1)] authentication of 'C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=Security-GW-531.host.net' (myself) with RSA_EMSA_PKCS1_SHA2_384 successful 2021-03-18T09:38:12.391737-07:00 Sec-GW ipsec-charon: 63[CFG-CTRL(1)] build_r:: this->my_auth->build: status = 0
2021-03-18T09:38:12.392050-07:00 Sec-GW ipsec-charon: 63[IKE-AUDIT(0)] IKE_SA 17IPsecTunnelServer_ipsec[19401] established between 10.212.202.4[C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=Security-GW-531.host.net]...10.212.200.43[C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=40000870.host.net] 2021-03-18T09:38:12.392521-07:00 Sec-GW ipsec-charon: 63[IKE-CTRL(1)] IKE_SA 17IPsecTunnelServer_ipsec[19401] [952952558d60c5a2_i-cf6f41544586d606_r] between 10.212.202.4[4500] to 10.212.200.43[4500] state change: CONNECTING => ESTABLISHED 2021-03-18T09:38:12.392799-07:00 Sec-GW ipsec-charon: 63[IKE-CTRL(1)] IKE_SA 17IPsecTunnelServer_ipsec[19401] scheduling rekeying in 86277s 2021-03-18T09:38:12.393086-07:00 Sec-GW ipsec-charon: 63[IKE-CTRL(1)] IKE_SA 17IPsecTunnelServer_ipsec[19401] maximum IKE_SA lifetime 86397s 2021-03-18T09:38:12.393358-07:00 Sec-GW ipsec-charon: 63[IKE-CTRL(1)] sending end entity cert "C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=Security-GW-531.host.net" 2021-03-18T09:38:12.393621-07:00 Sec-GW ipsec-charon: 63[CFG-CTRL(1)] unable to install policy 172.16.100.0/29 === 45.45.200.43/32 out for reqid 10805, the same policy for reqid 10646 exists 2021-03-18T09:38:12.393884-07:00 Sec-GW ipsec-charon: 63[IKE-CTRL(1)] unable to install IPsec policies (SPD) in kernel 2021-03-18T09:38:12.394157-07:00 Sec-GW ipsec-charon: 63[IKE-CTRL(1)] IKE_SA 17IPsecTunnelServer_ipsec[19401] closing IKE_SA due CHILD_SA setup failure 2021-03-18T09:38:12.395248-07:00 Sec-GW ipsec-charon: 63[IKE-DIAG(2)] process_message returned 0, setting return status to NEED_MORE 2021-03-18T09:38:12.395511-07:00 Sec-GW ipsec-charon: 63[IKE-DIAG(2)] process_message returned 11, destroying reassembled message 2021-03-18T09:38:12.492462-07:00 Sec-GW ipsec-charon: 21[IKE-DIAG(2)] queueing IKE_DELETE task 2021-03-18T09:38:12.492888-07:00 Sec-GW ipsec-charon: 21[IKE-DIAG(2)] activating new tasks for 17IPsecTunnelServer_ipsec[19401] 2021-03-18T09:38:12.493202-07:00 Sec-GW ipsec-charon: 21[IKE-DIAG(2)] activating IKE_DELETE task for 17IPsecTunnelServer_ipsec[19401] 2021-03-18T09:38:12.493487-07:00 Sec-GW ipsec-charon: 21[IKE-AUDIT(0)] deleting IKE_SA 17IPsecTunnelServer_ipsec[19401] between 10.212.202.4[C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=Security-GW-531.host.net]...10.212.200.43[C=US, ST=New Hampshire, L=Nashua, OU=Engineering, CN=40000870.host.net] 2021-03-18T09:38:12.493769-07:00 Sec-GW ipsec-charon: 21[IKE-CTRL(1)] IKE_SA 17IPsecTunnelServer_ipsec[19401] [952952558d60c5a2_i-cf6f41544586d606_r] between 10.212.202.4[4500] to 10.212.200.43[4500] state change: ESTABLISHED => DELETING 2021-03-18T09:38:12.494048-07:00 Sec-GW ipsec-charon: 21[IKE-CTRL(1)] sending DELETE for IKE_SA 17IPsecTunnelServer_ipsec[19401] 2021-03-18T09:38:12.495329-07:00 Sec-GW ipsec-charon: 14[IKE-AUDIT(0)] IKE_SA deleted 2021-03-18T09:38:12.495611-07:00 Sec-GW ipsec-charon: 14[IKE-CTRL(1)] IKE_SA 17IPsecTunnelServer_ipsec[19401] [952952558d60c5a2_i-cf6f41544586d606_r] between 10.212.202.4[4500] to 10.212.200.43[4500] state change: DELETING => DESTROYING

Additional context Host are behind NAT, and we just want to know how can we address this issue, as only way we could recover from this is to change the TSi of the Host.

mr-vincy avatar Mar 23 '21 04:03 mr-vincy

It's not a bug. Host-X needs to send INIT_CONTACT notify so the other peer knows this is the same peer and it can delete the old IKE_SA, and you need to have "uniqueids=yes" (ipsec.conf)/"connections..unique=yes" (swanctl.conf) configured so the daemon heeds the INIT_CONTACT notifies. "yes" is the default.

Thermi avatar Mar 23 '21 04:03 Thermi

Agreed, also parameters are as you pointed out. And duplicate listener is also enabled. Please not that the Security-GW has terminated the tunnel due to DPD timeout too. We believe it is some race condition. Can you please help me understand other them NAT change detection where are will Strongswan would install a block/drop policy in SPD.

mr-vincy avatar Mar 23 '21 04:03 mr-vincy

Please provide at the very least your complete configurations and debug logs as shown in #196 (bottom part is expandable and shows the right file logger configuration snippets for debug logging).

Thermi avatar Mar 23 '21 04:03 Thermi

Hi Thermi,

We us ViCi to configure the tunnel on both Host and Security-GW, and this issue is seen where the log levels are almost silent, and the issue is only see rarely like 3 times max as of i remember in the span of couple of years. We have done a lot of testing w.r.t duplicate tunnel detection and this works fine and clears the old tunnel when new tunnel with same inital_contact is received. We are trying to only understand in what race-condition or corner case, this issue can be seen, like I said before the drop/block policy I see getting added when StrongSwan get NAT change trigger for kernel. I can still send you the config logs if you really need it, from few months back when the system was last rebooted. As this is live system am unable to enable debug level logs too, if we can understand how we can reproduce we can try to reproduce this in lap setup to come up with some fixes.

When this issue was seen last time, with failing to reproduce this issue, I commented "del_policies_outbound" call for deleting the DROP policy in child_sa.c update method.

mr-vincy avatar Mar 23 '21 14:03 mr-vincy

Hi,

I can't tell what the exact issue is without logs. Looks like the only way is to replicate a production system with traffic in the lab.

Kind regards Noel

Thermi avatar Mar 23 '21 16:03 Thermi

Hi Noel,

This issue is not dependent on the amount of traffic, i believe some issue with NAT change on initiator and retran-timeout on Secuirty-GW. Can you let least help me point any other case you can think of where StrongSwan can install a POLICY_DROP?

Thanks & Regards, Vinay Goutham

mr-vincy avatar Mar 23 '21 17:03 mr-vincy

A drop policy is a policy with the specific "block" mode set. (can be seen in ip -s x p in the line directly below the TS) E.g.:

# ip -s -d x p                                                                                                                          
src 1.1.1.1/32 dst 1.1.1.1/32 uid 0
        dir out action block index 2657 priority 567231 ptype main share any flag  (0x00000000)
        lifetime config:
          limit: soft (INF)(bytes), hard (INF)(bytes)
          limit: soft (INF)(packets), hard (INF)(packets)
          expire add: soft 0(sec), hard 0(sec)
          expire use: soft 0(sec), hard 0(sec)
        lifetime current:
          0(bytes), 0(packets)
          add 2021-03-26 21:31:23 use -

That's a trap policy without a matching state yet:

src 192.168.178.26/32 dst 185.112.147.121/32 uid 0
        dir out action allow index 2745 priority 367232 ptype main share any flag  (0x00000000)
        lifetime config:
          limit: soft (INF)(bytes), hard (INF)(bytes)
          limit: soft (INF)(packets), hard (INF)(packets)
          expire add: soft 0(sec), hard 0(sec)
          expire use: soft 0(sec), hard 0(sec)
        lifetime current:
          0(bytes), 0(packets)
          add 2021-03-26 21:34:44 use -
        tmpl src 192.168.178.26 dst 185.112.147.121
                proto esp spi 0x00000000(0) reqid 1(0x00000001) mode tunnel
                level required share any 
                enc-mask ffffffff auth-mask ffffffff comp-mask ffffffff

The same policies repeat for dir fwd and dir in (as with dir out).

The problem you probably have is just a policy without a state. strongSwan negotiates the policies though when it gets notified by the kernel through an acquire. Without logs, I can't help you much further. strongSwan cooperates with logrotate just fine (very broad hint).

Thermi avatar Mar 26 '21 20:03 Thermi

Are you sure it's a drop policy? (Please provide more logs and status output e.g. from ip xfrm.) If you didn't configure one, then they are only installed during address updates (i.e. after roaming to a new IP address while updating the SAs and policies). Do you use custom policy priorities? (There was a bug related to that and these drop policies that was fixed with 5.8.0.)

tobiasbrunner avatar May 04 '21 09:05 tobiasbrunner

Hi Tobias,

The logs rolled over, and what I have is there above, and for ip xfrm policy the team rebooted the system before taking that output. Can you point me to the patch with the fix your taking about, I will check if that make sense. I also believe this is due to NATT change, and some other collusion.

mr-vincy avatar May 04 '21 17:05 mr-vincy

This is the patch for the priority/CHILD_SA update issue: 8e31d657307d7e5d5bf70fa3dba869b5e014845f But this is only relevant if you actually configured custom priorities for CHILD_SAs.

tobiasbrunner avatar May 05 '21 08:05 tobiasbrunner