IKE_AUTH requests crossed
Please Tell Us About Your Operating System
- RHEL / Fedora
- libreswan-5.3
- kernel-6.12
Please Describe Your IPsec Configuration
LEFT
====
conn test
type=transport
authby=secret
auto=add
left=10.10.0.1
right=10.10.0.2
RIGHT
=====
conn test
type=transport
authby=secret
auto=add
left=10.10.0.2
right=10.10.0.1
Please Tell Us What Happened?
First of all, this is very similar to #2184.
When IKE_AUTH requests between two nodes cross we end up with IKE and Child SA on both sides but Child SAs do not match. Both IKE and Child SA are results of responding to IKE AUTH request from the other side while IKE SA there is deleteted because it's superseeded.
Here's more detailed breakdown:
- Left and right initiate connection simultaneously, IKE_SA_INIT requests cross:
LEFT
====
#1: initiating IKEv2 connection to 10.10.0.2 using UDP
#1: sent IKE_SA_INIT request to 10.10.0.2:UDP/500
RIGHT
=====
#2: initiating IKEv2 connection to 10.10.0.1 using UDP
#2: sent IKE_SA_INIT request to 10.10.0.1:UDP/500
- Both sides respond to initiator's IKE_SA_INIT request and responses again cross:
LEFT
====
#2: processing IKE_SA_INIT request from 10.10.0.2:UDP/500
#2: sent IKE_SA_INIT response to 10.10.0.2:UDP/500
#1: processed IKE_SA_INIT response from 10.10.0.2:UDP/500
RIGHT
=====
#1: processing IKE_SA_INIT request from 10.10.0.1:UDP/500
#1: sent IKE_SA_INIT response to 10.10.0.1:UDP/500
#2: processed IKE_SA_INIT response from 10.10.0.1:UDP/500
- Since they got their IKE_SA_INIT responses both left and right send IKE_AUTH requests and they again cross:
LEFT
====
#1: sent IKE_AUTH request to 10.10.0.2:UDP/500
RIGHT
=====
#1: sent IKE_AUTH request to 10.10.0.1:UDP/500
- As you may guess at this point, IKE_AUTH responses cross, too. But this poses a problem because both left and right creates IKE SA and Child SA as responders and they delete their (initiator) IKE SA since it is not needed anymore. However, it has IKE_AUTH request "outstanding" and hence no DELETE notification is sent:
LEFT
====
#2: received IKE_AUTH request, computing DH in the background
#2: responder established IKE SA; authenticated peer using authby=secret and IPV4_ADDR '10.10.0.2'
#4: responder established Child SA using #2; IPsec transport [10.10.0.1/32===10.10.0.2/32] {ESP/ESN=>0x151a97c1 <0x9103df50
#1: dropping negotiation as superseded by established IKE SA #2
#1: deleting IKE SA (sent IKE_AUTH request)
RIGHT
=====
#2: received IKE_AUTH request, computing DH in the background
#4: responder established Child SA using #2; IPsec transport [10.10.0.2/32===10.10.0.1/32] {ESP/ESN=>0xb23075f7 <0xee9d0358
#1: dropping negotiation as superseded by established IKE SA #2
#1: deleting IKE SA (sent IKE_AUTH request)
All in all, we end up with tunnels created but they don't match:
LEFT
====
#2: "test":500 ESTABLISHED_IKE_SA (established IKE SA); REKEY in 28430s; REPLACE in 28700s; newest; idle;
#4: "test":500 ESTABLISHED_CHILD_SA (established Child SA); REKEY in 28430s; REPLACE in 28700s; newest; eroute owner; IKE SA #2; idle;
#4: "test" [email protected] [email protected] Traffic: ESPin=0B ESPout=640B ESPmax=2^63B
RIGHT
=====
#2: "test":500 ESTABLISHED_IKE_SA (established IKE SA); REKEY in 28429s; REPLACE in 28699s; newest; idle;
#4: "test":500 ESTABLISHED_CHILD_SA (established Child SA); REKEY in 28429s; REPLACE in 28699s; newest; eroute owner; IKE SA #2; idle;
#4: "test" [email protected] [email protected] Traffic: ESPin=0B ESPout=640B ESPmax=2^63B
In #2184 we sort of came to the conclusion that to prevent this case of crossing streams could be sending TEMPORARY_ERROR message when there is some IKE exchange while there is already initiator IKE_AUTH request outstading. Perhaps this could even be non-default behaviour since issues like this will only happen in huge clusters where hundreds of nodes initiate connections at the same time.
This can be reproduced as follows:
LEFT
====
# ipsec whack --impair suppress_retransmits=yes
# ipsec whack --impair block_outbound=yes
# ipsec whack --initiate --asynchronous --name test
RIGHT
=====
# ipsec whack --impair suppress_retransmits=yes
# ipsec whack --impair block_outbound=yes
# ipsec whack --initiate --asynchronous --name test
LEFT
====
# ipsec whack --impair drip_outbound=1 && sleep 1
RIGHT
=====
# ipsec whack --impair drip_outbound=1 && sleep 1
LEFT
====
# ipsec whack --impair drip_outbound=2 && sleep 1
RIGHT
=====
# ipsec whack --impair drip_outbound=2 && sleep 1
LEFT
====
# ipsec whack --impair drip_outbound=3 && sleep 1
RIGHT
=====
ipsec whack --impair drip_outbound=3 && sleep 1
Relevant log output
LEFT
====
Sep 19 06:12:26.728526: impair: suppress_retransmits: no -> yes
Sep 19 06:12:26.776061: IMPAIR: recording all outbound messages
Sep 19 06:12:26.776085: IMPAIR: block all outbound messages: no -> yes
Sep 19 06:12:26.879349: "test" #1: initiating IKEv2 connection to 10.10.0.2 using UDP
Sep 19 06:12:26.880587: "test" #1: IMPAIR: suppressing retransmits; scheduling timeout in 60 seconds
Sep 19 06:12:26.880639: "test" #1: IMPAIR: blocking outbound message 1
Sep 19 06:12:26.880644: "test" #1: sent IKE_SA_INIT request to 10.10.0.2:UDP/500
Sep 19 06:12:26.950859: IMPAIR: start processing outbound drip packet 1
Sep 19 06:12:26.951194: IMPAIR: stop processing outbound drip packet 1
Sep 19 06:12:27.990620: "test" #2: processing IKE_SA_INIT request from 10.10.0.2:UDP/500 containing SA,KE,Ni,N(IKEV2_FRAGMENTATION_SUPPORTED),N(NAT_DETECTION_SOURCE_IP),N(NAT_DETECTION_DESTINATION_IP)
Sep 19 06:12:27.990650: "test" #2: proposal 1:IKE=AES_GCM_16_256-HMAC_SHA2_512-ECP_256 chosen from remote proposals 1:IKE:ENCR=AES_GCM_16_256;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192[first-match] 2:IKE:ENCR=AES_GCM_16_128;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192 3:IKE:ENCR=CHACHA20_POLY1305;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192 4:IKE:ENCR=AES_CBC_256;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;INTEG=HMAC_SHA2_512_256;INTEG=HMAC_SHA2_256_128;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192 5:IKE:ENCR=AES_CBC_128;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;INTEG=HMAC_SHA2_512_256;INTEG=HMAC_SHA2_256_128;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192
Sep 19 06:12:27.992650: "test" #2: IMPAIR: blocking outbound message 2
Sep 19 06:12:27.992667: "test" #2: sent IKE_SA_INIT response to 10.10.0.2:UDP/500 {cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_512 group=DH19}
Sep 19 06:12:29.028185: IMPAIR: start processing outbound drip packet 2
Sep 19 06:12:29.028345: IMPAIR: stop processing outbound drip packet 2
Sep 19 06:12:30.069214: "test" #1: processed IKE_SA_INIT response from 10.10.0.2:UDP/500 {cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_512 group=DH19}, initiating IKE_AUTH
Sep 19 06:12:30.069281: "test" #1: WARNING: '10.10.0.1' PSK length of 9 bytes is too short for PRF HMAC_SHA2_512 in FIPS mode (32 bytes required)
Sep 19 06:12:30.069419: "test" #1: IMPAIR: suppressing retransmits; scheduling timeout in 60 seconds
Sep 19 06:12:30.069432: "test" #1: IMPAIR: blocking outbound message 3
Sep 19 06:12:30.069442: "test" #1: sent IKE_AUTH request to 10.10.0.2:UDP/500 with shared-key-mac and IPV4_ADDR '10.10.0.1'; Child SA #3 {ESP <0xfc2fd52d}
Sep 19 06:12:31.105917: IMPAIR: start processing outbound drip packet 3
Sep 19 06:12:31.106480: IMPAIR: stop processing outbound drip packet 3
Sep 19 06:12:32.144534: "test" #2: received IKE_AUTH request, computing DH in the background
Sep 19 06:12:32.145349: "test" #2: processing decrypted IKE_AUTH request from 10.10.0.2:UDP/500 containing SK{IDi,AUTH,SA,TSi,TSr,N(USE_TRANSPORT_MODE)}
Sep 19 06:12:32.145389: "test" #2: WARNING: '10.10.0.1' PSK length of 9 bytes is too short for PRF HMAC_SHA2_512 in FIPS mode (32 bytes required)
Sep 19 06:12:32.145436: "test" #2: responder established IKE SA; authenticated peer using authby=secret and IPV4_ADDR '10.10.0.2'
Sep 19 06:12:32.145451: "test" #2: WARNING: '10.10.0.1' PSK length of 9 bytes is too short for PRF HMAC_SHA2_512 in FIPS mode (32 bytes required)
Sep 19 06:12:32.146759: "test" #4: proposal 1:ESP=AES_GCM_16_256-ESN:YES SPI=ef70befb chosen from remote proposals 1:ESP:ENCR=AES_GCM_16_256;ESN=YES;ESN=NO[first-match] 2:ESP:ENCR=AES_GCM_16_128;ESN=YES;ESN=NO 3:ESP:ENCR=CHACHA20_POLY1305;ESN=YES;ESN=NO 4:ESP:ENCR=AES_CBC_256;INTEG=HMAC_SHA2_512_256;INTEG=HMAC_SHA2_256_128;ESN=YES;ESN=NO 5:ESP:ENCR=AES_CBC_128;INTEG=HMAC_SHA2_512_256;INTEG=HMAC_SHA2_256_128;ESN=YES;ESN=NO
Sep 19 06:12:32.190766: "test" #4: responder established Child SA using #2; IPsec transport [10.10.0.1/32===10.10.0.2/32] {ESP/ESN=>0xef70befb <0xfe1810dd xfrm=AES_GCM_16_256-NONE DPD=passive}
Sep 19 06:12:32.190815: "test" #2: IMPAIR: blocking outbound message 4
Sep 19 06:12:33.184528: impair: suppress_retransmits: yes -> no
Sep 19 06:12:33.254252: IMPAIR: block all outbound messages: yes -> no
Sep 19 06:13:30.068529: "test" #1: dropping negotiation as superseded by established IKE SA #2
Sep 19 06:13:30.068759: ERROR: "test" #3: netlink response for Del SA [email protected]: No such process (errno 3)
Sep 19 06:13:30.068775: "test" #1: deleting IKE SA (sent IKE_AUTH request)
RIGHT
=====
Sep 19 06:12:26.809854: impair: suppress_retransmits: no -> yes
Sep 19 06:12:26.843226: IMPAIR: recording all outbound messages
Sep 19 06:12:26.843250: IMPAIR: block all outbound messages: no -> yes
Sep 19 06:12:26.915385: "test" #1: initiating IKEv2 connection to 10.10.0.1 using UDP
Sep 19 06:12:26.916428: "test" #1: IMPAIR: suppressing retransmits; scheduling timeout in 60 seconds
Sep 19 06:12:26.916486: "test" #1: IMPAIR: blocking outbound message 1
Sep 19 06:12:26.916494: "test" #1: sent IKE_SA_INIT request to 10.10.0.1:UDP/500
Sep 19 06:12:26.951051: "test" #2: processing IKE_SA_INIT request from 10.10.0.1:UDP/500 containing SA,KE,Ni,N(IKEV2_FRAGMENTATION_SUPPORTED),N(NAT_DETECTION_SOURCE_IP),N(NAT_DETECTION_DESTINATION_IP)
Sep 19 06:12:26.951090: "test" #2: proposal 1:IKE=AES_GCM_16_256-HMAC_SHA2_512-ECP_256 chosen from remote proposals 1:IKE:ENCR=AES_GCM_16_256;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192[first-match] 2:IKE:ENCR=AES_GCM_16_128;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192 3:IKE:ENCR=CHACHA20_POLY1305;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192 4:IKE:ENCR=AES_CBC_256;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;INTEG=HMAC_SHA2_512_256;INTEG=HMAC_SHA2_256_128;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192 5:IKE:ENCR=AES_CBC_128;PRF=HMAC_SHA2_512;PRF=HMAC_SHA2_256;INTEG=HMAC_SHA2_512_256;INTEG=HMAC_SHA2_256_128;DH=ECP_256;DH=ECP_384;DH=ECP_521;DH=CURVE25519;DH=MODP4096;DH=MODP3072;DH=MODP2048;DH=MODP8192
Sep 19 06:12:26.953092: "test" #2: IMPAIR: blocking outbound message 2
Sep 19 06:12:26.953108: "test" #2: sent IKE_SA_INIT response to 10.10.0.1:UDP/500 {cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_512 group=DH19}
Sep 19 06:12:27.990397: IMPAIR: start processing outbound drip packet 1
Sep 19 06:12:27.990486: IMPAIR: stop processing outbound drip packet 1
Sep 19 06:12:29.029456: "test" #1: processed IKE_SA_INIT response from 10.10.0.1:UDP/500 {cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_512 group=DH19}, initiating IKE_AUTH
Sep 19 06:12:29.030583: "test" #1: WARNING: '10.10.0.2' PSK length of 9 bytes is too short for PRF HMAC_SHA2_512 in FIPS mode (32 bytes required)
Sep 19 06:12:29.030693: "test" #1: IMPAIR: suppressing retransmits; scheduling timeout in 60 seconds
Sep 19 06:12:29.030702: "test" #1: IMPAIR: blocking outbound message 3
Sep 19 06:12:29.030709: "test" #1: sent IKE_AUTH request to 10.10.0.1:UDP/500 with shared-key-mac and IPV4_ADDR '10.10.0.2'; Child SA #3 {ESP <0xef70befb}
Sep 19 06:12:30.066363: IMPAIR: start processing outbound drip packet 2
Sep 19 06:12:30.066439: IMPAIR: stop processing outbound drip packet 2
Sep 19 06:12:31.106150: "test" #2: received IKE_AUTH request, computing DH in the background
Sep 19 06:12:31.108003: "test" #2: processing decrypted IKE_AUTH request from 10.10.0.1:UDP/500 containing SK{IDi,AUTH,SA,TSi,TSr,N(USE_TRANSPORT_MODE)}
Sep 19 06:12:31.108053: "test" #2: WARNING: '10.10.0.2' PSK length of 9 bytes is too short for PRF HMAC_SHA2_512 in FIPS mode (32 bytes required)
Sep 19 06:12:31.108089: "test" #2: responder established IKE SA; authenticated peer using authby=secret and IPV4_ADDR '10.10.0.1'
Sep 19 06:12:31.108104: "test" #2: WARNING: '10.10.0.2' PSK length of 9 bytes is too short for PRF HMAC_SHA2_512 in FIPS mode (32 bytes required)
Sep 19 06:12:31.108724: "test" #4: proposal 1:ESP=AES_GCM_16_256-ESN:YES SPI=fc2fd52d chosen from remote proposals 1:ESP:ENCR=AES_GCM_16_256;ESN=YES;ESN=NO[first-match] 2:ESP:ENCR=AES_GCM_16_128;ESN=YES;ESN=NO 3:ESP:ENCR=CHACHA20_POLY1305;ESN=YES;ESN=NO 4:ESP:ENCR=AES_CBC_256;INTEG=HMAC_SHA2_512_256;INTEG=HMAC_SHA2_256_128;ESN=YES;ESN=NO 5:ESP:ENCR=AES_CBC_128;INTEG=HMAC_SHA2_512_256;INTEG=HMAC_SHA2_256_128;ESN=YES;ESN=NO
Sep 19 06:12:31.152207: "test" #4: responder established Child SA using #2; IPsec transport [10.10.0.2/32===10.10.0.1/32] {ESP/ESN=>0xfc2fd52d <0xf92060b5 xfrm=AES_GCM_16_256-NONE DPD=passive}
Sep 19 06:12:31.152246: "test" #2: IMPAIR: blocking outbound message 4
Sep 19 06:12:32.144293: IMPAIR: start processing outbound drip packet 3
Sep 19 06:12:32.145161: IMPAIR: stop processing outbound drip packet 3
Sep 19 06:12:33.219076: impair: suppress_retransmits: yes -> no
Sep 19 06:12:33.289032: IMPAIR: block all outbound messages: yes -> no
Sep 19 06:13:29.032893: "test" #1: dropping negotiation as superseded by established IKE SA #2
Sep 19 06:13:29.033127: ERROR: "test" #3: netlink response for Del SA [email protected]: No such process (errno 3)
Sep 19 06:13:29.033146: "test" #1: deleting IKE SA (sent IKE_AUTH request)
Just want to add I have run into same IKE _AUTH issue in my setup (I use OE) but ran into the issue where both ends have mismatched SPI.
cat /etc/ipsec.d/stre0.conf
conn private-or-clear-3 type=transport auto=route ikev2=insist nic-offload=packet negotiationshunt=passthrough failureshunt=passthrough authby=null rightid=%null leftid=%null right=%opportunisticgroup left=192.168.85.13
Machine B Logs Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[3] ...192.168.85.13 #8: sent IKE_AUTH request to 192.168.85.13:UDP/500 with null and NULL 'ID_NULL'; Child SA #32 {ESP <0x7efe15aa} Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[8] ...192.168.85.13 #23: received IKE_AUTH request, computing DH in the background Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[8] ...192.168.85.13 #23: processing decrypted IKE_AUTH request from 192.168.85.13:UDP/500 containing SK{IDi,IDr,AUTH,SA,TSi,TSr,N(USE_TRANSPORT_MODE)} Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[8] ...192.168.85.13 #23: responder established IKE SA; authenticated peer using authby=null and NULL 'ID_NULL' Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[8] ...192.168.85.13 #23: NULL auth ID for different IP's cannot replace each other Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[8] ...192.168.85.13 #23: NULL auth ID for different IP's cannot replace each other Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[3] ...192.168.85.13: terminating SAs using this connection Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[3] ...192.168.85.13 #8: deleting IKE SA (sent IKE_AUTH request) Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[8] ...192.168.85.13 #23: NULL auth ID for different IP's cannot replace each other Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[8] ...192.168.85.13 #23: NULL auth ID for different IP's cannot replace each other Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-3#192.168.85.0/24"[8] ...192.168.85.13 #34: responder established Child SA using #23; IPsec transport [192.168.85.5/32===192.168.85.13/32] {ESP/ESN=>0x53db9edd <0x90daf8f3 xfrm=AES_GCM_16_256-NONE nic-offload=packet DPD=passive} Sep 22 15:09:24 scaqaw18celadm05 pluto[92548]: "private-or-clear-4#192.168.85.0/24"[3] ...192.168.85.13 #3: initiator established IKE SA; authenticated peer using authby=null and NULL 'ID_NULL'
Machine B spi is 0x90daf8f3 whereas Machine A See line 1 above in logs SPI sent to Machine A was 0x7efe15aa
ip -s x s | grep -A 20 'src 192.168.85.13 dst 192.168.85.5'
src 192.168.85.13 dst 192.168.85.5 proto esp spi 0x90daf8f3(2430269683) reqid 16461(0x0000404d) mode transport replay-window 0 seq 0x00000000 flag esn (0x10000000) aead rfc4106(gcm(aes)) 0xd92d0295b4fa0e33e8c99e09e9fe190734d9d0e589466a1264b082721839f2d9bda288b0 (288 bits) 128 anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 128, bitmap-length 4 00000000 00000000 00000000 00000000 crypto offload parameters: dev stre0 dir in mode packet sel src 192.168.85.13/32 dst 192.168.85.5/32 uid 0 lifetime config: limit: soft 4142862594814785709(bytes), hard 9223372036854775808(bytes) limit: soft 4319636942556533329(packets), hard 9223372036854775808(packets) expire add: soft 0(sec), hard 0(sec) expire use: soft 0(sec), hard 0(sec) lifetime current: 0(bytes), 0(packets) add 2025-09-22 15:09:24 use 2025-09-22 15:09:25 stats: replay-window 0 replay 0 failed 0
If you do tcpdump you will see Machine B You will see spi for any packets from Machine A
tcpdump -i stre0 -n -v ip src 192.168.85.13 and dst 192.168.85.5
dropped privs to tcpdump tcpdump: listening on stre0, link-type EN10MB (Ethernet), capture size 262144 bytes 23:01:49.614297 IP (tos 0x22,ECT(0), ttl 64, id 42681, offset 0, flags [DF], proto ESP (50), length 344) 192.168.85.13 > 192.168.85.5: ESP(spi=0x7efe15aa,seq=0x9ee0), length 324 23:01:50.581442 IP (tos 0x0, ttl 64, id 42187, offset 0, flags [DF], proto ESP (50), length 576) 192.168.85.13 > 192.168.85.5: ESP(spi=0x7efe15aa,seq=0x9ee1), length 556 23:01:50.581471 IP (tos 0x0, ttl 64, id 42188, offset 0, flags [DF], proto ESP (50), length 576) 192.168.85.13 > 192.168.85.5: ESP(spi=0x7efe15aa,seq=0x9ee2), length 556
(Look at SPI on Machine A is in one prior to that was deleted)
Machine A spi is 0x7efe15aa
ip -s x s | grep -A 20 'src 192.168.85.13 dst 192.168.85.5'
src 192.168.85.13 dst 192.168.85.5 proto esp spi 0x7efe15aa(2130580906) reqid 16485(0x00004065) mode transport replay-window 0 seq 0x00000000 flag esnextra_flag dont-encap-dscp (0x10000000) aead rfc4106(gcm(aes)) 0xb56b0343b441a050fe2f87dff17a6cae96e65eeeb353fe6ace61d4b11eea86ac48e46875 (288 bits) 128 anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 128, bitmap-length 4 00000000 00000000 00000000 00000000 crypto offload parameters: dev stre0 dir out mode packet sel src 192.168.85.13/32 dst 192.168.85.5/32 uid 0 lifetime config: limit: soft 4164573591687143329(bytes), hard 9223372036854775808(bytes) limit: soft 4373313835894194722(packets), hard 9223372036854775808(packets) expire add: soft 0(sec), hard 0(sec) expire use: soft 0(sec), hard 0(sec) lifetime current: 15544464(bytes), 35579(packets) add 2025-09-22 15:09:25 use 2025-09-22 15:09:26 stats: replay-window 0 replay 0 failed 0
Same issue mismatched tunnels and hence no communication between 192.168.85.13 and 192.168.85.5.
@mamtagambhir IMO the current approach (targeting permanent connections only) would not fix your issue since in your case connection are not permanent (but IMO it can be extend to cover tham).
Thank you @The-Mule understood, I am using opportunistic and auto=route which explains the behaviour, but in my scenario of a multi-node cluster, I hit this very frequently. If you plan to extend the fix, I will be happy to run/debug in my setup. Else Do you have any ideas to mitigate this issue on startup ?
@The-Mule for opportunistic connections on demand will be temporary. I see this issue ver frequently on SA expiry. NODE 1 SPI MISMATCH [root@adm08vm02 ~]# ip -s x s | grep -A 12 'src 192.201.83.8 dst 192.201.83.1' src 192.201.83.8 dst 192.201.83.1 proto esp spi 0x4922a4fb(1227007227) reqid 16681(0x00004129) mode transport replay-window 0 seq 0x00000000 flag esnextra_flag dont-encap-dscp (0x10000000) aead rfc4106(gcm(aes)) 0x6ea5a8cd702881e111a18084153b27617717450dce26c9fc1d9c0a86a7b7b5c791706615 (288 bits) 128 anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 128, bitmap-length 4 00000000 00000000 00000000 00000000 crypto offload parameters: dev clre1 dir out mode packet sel src 192.201.83.8/32 dst 192.201.83.1/32 uid 0 lifetime config: limit: soft 4201755230314888954(bytes), hard 9223372036854775808(bytes) limit: soft 4428122436614876826(packets), hard 9223372036854775808(packets) [root@ad8vm02 ~]m0# ip -s x s | grep -A 12 'src 192.201.83.1 dst 192.201.83.8' src 192.201.83.1 dst 192.201.83.8 proto esp spi 0xebc19bf4(3955334132) reqid 16681(0x00004129) mode transport replay-window 0 seq 0x00000000 flag esn (0x10000000) aead rfc4106(gcm(aes)) 0x0d9ea64c9bb7e98d12aa62d9426ed39b2cc626169368bddf00df45140b1997cb90bb0bed (288 bits) 128 anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 128, bitmap-length 4 00000000 00000000 00000000 00000000 crypto offload parameters: dev clre1 dir in mode packet sel src 192.201.83.1/32 dst 192.201.83.8/32 uid 0 lifetime config: limit: soft 4137110770220585264(bytes), hard 9223372036854775808(bytes) limit: soft 4144803493724083939(packets), hard 9223372036854775808(packets)
NODE 2 SPI MISTMATCH
root@adm05vm02 ~]# ip -s x s | grep -A 12 'src 192.201.83.8 dst 192.201.83.1' src 192.201.83.8 dst 192.201.83.1 proto esp spi 0x2b397c20(725187616) reqid 17213(0x0000433d) mode transport replay-window 0 seq 0x00000000 flag esn (0x10000000) aead rfc4106(gcm(aes)) 0x70d1ec3ad0a7541a4a71ff7ed5ba7cccff97ea46c1f7f27936ae1b14b877da4b439f409d (288 bits) 128 anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 128, bitmap-length 4 00000000 00000000 00000000 00000000 crypto offload parameters: dev clre0 dir in mode packet sel src 192.201.83.8/32 dst 192.201.83.1/32 uid 0 lifetime config: limit: soft 4257214771904928838(bytes), hard 9223372036854775808(bytes) limit: soft 4401287183595121426(packets), hard 9223372036854775808(packets) [root@adm05vm02 ~]# ip -s x s | grep -A 12 'src 192.201.83.1 dst 192.201.83.8' src 192.201.83.1 dst 192.201.83.8 proto esp spi 0x14d88d8b(349736331) reqid 17213(0x0000433d) mode transport replay-window 0 seq 0x00000000 flag esnextra_flag dont-encap-dscp (0x10000000) aead rfc4106(gcm(aes)) 0xa40be3671ab011689b8c9d2f455f848f1627bb225aa2d2acd654beaf4b4148c81a5cd794 (288 bits) 128 anti-replay esn context: seq-hi 0x0, seq 0x0, oseq-hi 0x0, oseq 0x0 replay_window 128, bitmap-length 4 00000000 00000000 00000000 00000000 crypto offload parameters: dev clre0 dir out mode packet sel src 192.201.83.1/32 dst 192.201.83.8/32 uid 0 lifetime config: limit: soft 4336544413424785179(bytes), hard 9223372036854775808(bytes) limit: soft 4599301960237150979(packets), hard 9223372036854775808(packets)
This happens at 4:17 logs show teh SA expiry when this happend Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[3] ...192.201.83.1 #147: IKE SA expired (LATEST!) Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[3] ...192.201.83.1 #147: deleting IKE SA (ESTABLISHED_IKE_SA) aged 183.996716s and sending notification Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[3] ...192.201.83.1 #142: ESP traffic information: in=809MiB out=645MiB Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[3] ...192.201.83.1 #47: ESP traffic information: in=40KiB out=41KiB Oct 15 04:17:49 adm08vm02 pluto[151140]: packet from 192.201.83.1:500: INFORMATIONAL response has no corresponding IKE SA; message dropped Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[17] ...192.201.83.1: initiate on-demand for packet 192.201.83.8:59799-UDP->192.201.83.1:31202 Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[17] ...192.201.83.1 #166: initiating IKEv2 connection to 192.201.83.1 using UDP Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[17] ...192.201.83.1 #166: sent IKE_SA_INIT request to 192.201.83.1:UDP/500 Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: processing IKE_SA_INIT request from 192.201.83.1:UDP/500 containing SA,KE,Ni,N(IKEV2_FRAGMENTATION_SUPPORTED),N(NAT_DETECTION_SOURCE_IP),N(NAT_DETECTION_DESTINATION_IP),V Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: sent IKE_SA_INIT response to 192.201.83.1:UDP/500 {cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_512 group=DH19} Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[17] ...192.201.83.1 #166: processed IKE_SA_INIT response from 192.201.83.1:UDP/500 {cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_512 group=DH19}, initiating IKE_AUTH Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[17] ...192.201.83.1 #166: sent IKE_AUTH request to 192.201.83.1:UDP/500 with null and NULL 'ID_NULL'; Child SA #168 {ESP <0x14d88d8b} Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: received IKE_AUTH request, computing DH in the background Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: processing decrypted IKE_AUTH request from 192.201.83.1:UDP/500 containing SK{IDi,IDr,AUTH,SA,TSi,TSr,N(USE_TRANSPORT_MODE)} Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: responder established IKE SA; authenticated peer using authby=null and NULL 'ID_NULL' Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[17] ...192.201.83.1: terminating SAs using this connection Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[17] ...192.201.83.1 #166: deleting IKE SA (sent IKE_AUTH request) Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #167: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 adm08vm02 pluto[151140]: "private-or-clear-2#192.201.83.0/24"[18] ...192.201.83.1 #169: responder established Child SA using #167; IPsec transport [192.201.83.8/32===192.201.83.1/32] {ESP/ESN=>0x4922a4fb <0xebc19bf4 xfrm=AES_GCM_16_256-NONE nic-offload=packet DPD=passive} Oct 15 04:17:49 adm08vm02 pluto[151140]: packet from 192.201.83.1:500: IKE_AUTH response has no corresponding IKE SA; message dropped
Where as on NODE 2 Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[56] ...192.201.83.8 #613: ESP traffic information: in=649MiB out=807MiB Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[56] ...192.201.83.8 #536: deleting lingering IKE SA Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[56] ...192.201.83.8 #536: ESP traffic information: in=41KiB out=40KiB Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[56] ...192.201.83.8 #616: deleting IKE SA (established IKE SA) Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[64] ...192.201.83.8: initiate on-demand for packet 192.201.83.1:31202-UDP->192.201.83.8:59799 Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[64] ...192.201.83.8 #629: initiating IKEv2 connection to 192.201.83.8 using UDP Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[64] ...192.201.83.8 #629: sent IKE_SA_INIT request to 192.201.83.8:UDP/500 Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: processing IKE_SA_INIT request from 192.201.83.8:UDP/500 containing SA,KE,Ni,N(IKEV2_FRAGMENTATION_SUPPORTED),N(NAT_DETECTION_SOURCE_IP),N(NAT_DETECTION_DESTINATION_IP),V Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: sent IKE_SA_INIT response to 192.201.83.8:UDP/500 {cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_512 group=DH19} Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[64] ...192.201.83.8 #629: processed IKE_SA_INIT response from 192.201.83.8:UDP/500 {cipher=AES_GCM_16_256 integ=n/a prf=HMAC_SHA2_512 group=DH19}, initiating IKE_AUTH Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: received IKE_AUTH request, computing DH in the background Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[64] ...192.201.83.8 #629: sent IKE_AUTH request to 192.201.83.8:UDP/500 with null and NULL 'ID_NULL'; Child SA #631 {ESP <0x4922a4fb} Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: processing decrypted IKE_AUTH request from 192.201.83.8:UDP/500 containing SK{IDi,IDr,AUTH,SA,TSi,TSr,N(USE_TRANSPORT_MODE)} Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: responder established IKE SA; authenticated peer using authby=null and NULL 'ID_NULL' Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[64] ...192.201.83.8: terminating SAs using this connection Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[64] ...192.201.83.8 #629: deleting IKE SA (sent IKE_AUTH request) Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #630: NULL auth ID for different IP's cannot replace each other Oct 15 04:17:49 pluto[63372]: "private-or-clear#192.201.83.0/24"[65] ...192.201.83.8 #632: responder established Child SA using #630; IPsec transport [192.201.83.1/32===192.201.83.8/32] {ESP/ESN=>0x14d88d8b <0x2b397c20 xfrm=AES_GCM_16_256-NONE nic-offload=packet DPD=passive} Oct 15 04:17:49 pluto[63372]: packet from 192.201.83.8:500: IKE_AUTH response has no corresponding IKE SA; message dropped
You will see 0x2b397c20 is not seen on Node 1 I am wondering Is it just the orphaned inbound SAs case . thanks for any insight ..
@mamtagambhir Sorry for the delay, is the connection actually broken when you encounter this problem?
Thank you @The-Mule for the response. Yes, you're right its broken connecrion—it's due to the mismatched SPI between the two nodes (srcip -> dstip) pair, as I show above . I’m using hardware offload, and after analyzing the packet capture and debugging, it became clear that both nodes had different SPI values, which is why no responses were sent. I was able to mitigate this by using 20% rekey_fuzz that adds a bit of randomness to the rekey timing(I use default 540s)