frr
frr copied to clipboard
ISIS Neighbor Stuck in Expiring
- [X] Did you check if this is a duplicate issue?
- [ ] Did you test it on the latest FRRouting/frr master branch?
Problem:
On certain links (those that are active for a period of time but not permanently) we have seen ISIS enter a 'Down/Expiring' state and never recover. We are able to clear this state by restarting the FRR process, but not if we use 'clear isis neighbor'. We have seen the issue on both 'broadcast' and 'point-to-point' network types. We have also seen the issue on version 8.3.1 as well as 8.5.2. We added BFD to try help force the neighbor down but this does not resolve the problem.
Some item values below are masked with dummy values.
#Shell Output (show isis neighbor)
Area ABC:
System Id Interface L State Holdtime SNPA
abc eno4 2 Down Expiring 0000.1111.2222
Configuration Snippet:
interface eno4
ip router isis ABC
isis bfd
isis bfd profile STANDARD
isis circuit-type level-2-only
isis metric 25
isis password md5 ***
isis priority 127
no isis hello padding
!
router isis ABC
is-type level-2-only
net 49.0000.wwwx.xxyy.yzzz.00
lsp-mtu 1400
lsp-timers gen-interval 1 refresh-interval 300 max-lifetime 1000
log-adjacency-changes
segment-routing on
segment-routing node-msd 8
segment-routing prefix w.x.y.z/32 index z
!
bfd
profile STANDARD
detect-multiplier 10
transmit-interval 1000
receive-interval 1000
echo receive-interval disabled
To Reproduce: This is not a consistent issue, but when the ISIS neighbor flaps in quick succession we typically see this stuck in expiring issue.
Expected behavior: ISIS neighbor is cleared after timeout expires.
Versions: FRR version 8.5.2 Linux Info: Ubuntu 18.04.4 LTS; kernel 4.15.0-197-generic
This is a message from the log before we see the stuck in expiring state.
ISIS: [XXX] %ADJCHANGE: Adjacency to abc (eno4) for level-2 changed from Unknown to Initializing, unspecified
ISIS: [XXX] %ADJCHANGE: Adjacency to abc (eno4) for level-2 changed from Initializing to Down, holding time expired
This sounds like a valid issue, but it's going to be hard to track down without being able to consistently reproduce it, I think ... Can y'all try to narrow down when this happens more?
Do you know of a specific trace or debug that might help us pinpoint the issue? This seems to happen when the neighbor does not fully establish (intermittent up/down link) which makes it hard to reproduce in a lab.
I suggest to activate ISIS debug message and look to the log to see if you could find more useful info. Please, post them to this issue. Useful debug should be:
debug isis adj-packets
debug isis events
debug isis packet-dump
debug isis snp-packets
@odd22 Debugs with timestamps from expiring event
2023/11/20 21:16:46 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 65 . 2023/11/20 21:16:47 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:16:47 ISIS: [XXXX] ISIS-Adj (ABC): Adjacency state change 0->1: unspecified 2023/11/20 21:16:47 ISIS: [XXXX] %ADJCHANGE: Adjacency to abc (eno4) for level-2 changed from Unknown to Initializing, unspecified 2023/11/20 21:16:47 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:16:47 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:16:50 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:16:50 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:16:50 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:16:53 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:16:53 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:16:53 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:16:56 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:16:56 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:16:56 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:16:59 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:16:59 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:16:59 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:02 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:02 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:17:02 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:04 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:05 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:17:05 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:07 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:08 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:17:08 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:10 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:10 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:17:10 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:13 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:13 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:17:13 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:16 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:16 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:17:16 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:19 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:19 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:17:19 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:21 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:22 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:17:22 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:24 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:25 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH on eno4, cirType L2, cirID 5 2023/11/20 21:17:25 ISIS: [XXXX] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 0000.1111.2222 on eno4, cirType L2, cirID 5, length 61 2023/11/20 21:17:27 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:30 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:33 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:36 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:39 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:42 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:45 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:48 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:50 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:53 ISIS: [XXXX] ISIS-Adj (ABC): Sending L2 LAN IIH on eno4, length 73 2023/11/20 21:17:55 ISIS: [XXXX] ISIS-Adj (ABC): Adjacency state change 1->3: holding time expired 2023/11/20 21:17:55 ISIS: [XXXX] %ADJCHANGE: Adjacency to abc (eno4) for level-2 changed from Initializing to Down, holding time expired
I've managed to consistently reproduce this problem in a lab:
!LAB Topology R1 <-L2 Switch-> R2
!Reproduction Steps On R2 restart FRR service multiple times in short period
!Outcome
#Up State
R1# show isis ne de Area LAB: R2
Interface: ens4, Level: 2, State: Up, Expires in 29s Adjacency flaps: 1, Last: 3m20s ago Circuit type: L2, Speaks: IPv4 SNPA: 5001.0002.0002, LAN id: 0010.0100.1001.03
!!!! R2 FRR service restarts initiated !!!!
#Initializing State
R1# sh isis ne de Area LAB: R2
Interface: ens4, Level: 2, State: Initializing, Expires in 20s Adjacency flaps: 1, Last: 3m49s ago Circuit type: L2, Speaks: IPv4 SNPA: 5001.0002.0002, LAN id: 0000.0000.0000.00
#Down/Expiring State
R1# sh isis ne de Area LAB: R2
Interface: ens4, Level: 2, State: Down Expiring Adjacency flaps: 1, Last: 4m33s ago Circuit type: L2, Speaks: IPv4 SNPA: 5001.0002.0002, LAN id: 0000.0000.0000.00
@odd22 Looking further into how I reproduced this error - It seems that when I stop the FRR service on one side of an ISIS peer (depending on how the peers are configured) I experience the ISIS Expiring problem:
ISIS P2P with BFD: When FRR service is stopped on Device-A, the active node (Device-B) receives a BFD down message followed by an ISIS hello - this triggers the state of Initializing > Down Expiring
ISIS P2P with no BFD: When FRR service is stopped on Device-A, the active node (Device-B) seems to act as expected but relies on ISIS timers.
ISIS B/Cast with BFD: When FRR service is stopped on Device-A, the active node (Device-B) receives a BFD down message followed by an ISIS hello - this triggers the state of Initializing > Down Expiring
ISIS B/Cast with no BFD: When FRR service is stopped on Device-A, the active node (Device-B) records the following messages in logs - this triggers the state of Initializing > Down Expiring:
2024/05/26 15:02:04 ISIS: [ABC] ISIS-Adj (ABC): Starting fast-reroute on state change 2->1: own SNPA not found in LAN Neighbours TLV 2024/05/26 15:02:04 ISIS: [ABC] ISIS-Adj (ABC): Adjacency state change 2->1: own SNPA not found in LAN Neighbours TLV 2024/05/26 15:02:04 ISIS: [ABC] ISIS-Adj (ABC): Rcvd L2 LAN IIH from 5001.0001.0001 on ens5, cirType L2, cirID 4, length 40 2024/05/26 15:02:04 ISIS: [ABC] ISIS-Evt (ABC) DIS status change 2024/05/26 15:02:04 ISIS: [ABC] ISIS-Adj (ABC): Sending L2 LAN IIH on ens5, length 48 2024/05/26 15:02:07 ISIS: [ABC] ISIS-Adj (ABC): Sending L2 LAN IIH on ens5, length 48 2024/05/26 15:02:10 ISIS: [ABC] ISIS-Adj (ABC): Sending L2 LAN IIH on ens5, length 48 2024/05/26 15:02:13 ISIS: [ABC] ISIS-Adj (ABC): Sending L2 LAN IIH on ens5, length 48 2024/05/26 15:02:14 ISIS: [ABC] ISIS-Adj (ABC): Adjacency state change 1->3: holding time expired
This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.
This issue will be automatically closed in the specified period unless there is further activity.