skupper icon indicating copy to clipboard operation
skupper copied to clipboard

[v2] Site CR status.network not updated for remote links when they are removed.

Open lynnemorrison opened this issue 8 months ago • 2 comments

After fixing issue #2006 for adding remote link status to the cli link status command I saw that the remote links were not being removed or status updated if I deleted the link on the remote site.

I had a three router setup. I deleted one of the links on private1. I was expecting the link to be removed from the site resource on public1, but when I look at the site resource the links are never removed. This missing update will the cause user to think the link is still active.

skupper link status -n private1 Outgoing link from this site: NAME STATUS COST MESSAGE public1-d72cbb23-d98d-4cf7-a943-a56d0d447498 Ready 1 OK public2-d00e1a17-cee5-43f3-9872-edbabc3b7b3c Ready 1 OK

Incoming Links from remote sites: There are no incoming link resources in the namespace kubectl delete link public1-d72cbb23-d98d-4cf7-a943-a56d0d447498 link.skupper.io "public1-d72cbb23-d98d-4cf7-a943-a56d0d447498" deleted

lynne@lynne-VirtualBox:~/skupper-example-prom$ skupper link status -n private1 Outgoing link from this site: NAME STATUS COST MESSAGE public2-d00e1a17-cee5-43f3-9872-edbabc3b7b3c Ready 1 OK

Incoming Links from remote sites: There are no incoming link resources in the namespace I waited 10 minutes and I still see the link from private1 to public1

skupper link status -n public1 There are no outgoing link resources in the namespace

Incoming Links from remote sites: NAME STATUS REMOTE SITE public1-d72cbb23-d98d-4cf7-a943-a56d0d447498 Ready public2
public1-d72cbb23-d98d-4cf7-a943-a56d0d447498 Ready private1 <---- I deleted this link on private1 If I delete the site private1 the links do go away.

skupper site delete --all Waiting for deletion to complete... Site "private1" is deleted It took about 5 minutes but link from private1 to public1 is removed

skupper link status -n public1 There are no outgoing link resources in the namespace

Incoming Links from remote sites: NAME STATUS REMOTE SITE public1-d72cbb23-d98d-4cf7-a943-a56d0d447498 Ready public2

lynnemorrison avatar Apr 10 '25 18:04 lynnemorrison

@lynnemorrison can confirm that I was able to reproduce this! My initial impression is that this is a challenge we faced with v1 pretty extensively, and that we aimed to improve a little in v2 with some of the vanflow link semantics changes without addressing the root of the issue.

If I recall, the gist of it is this: if you count on guaranteed delivery of vanflow events across a network that you are actively reshaping (like we are here) you tend to have a bad time.

EDIT To illustrate this point, here's a demo https://gist.github.com/c-kruse/44f451da5c20152b7e636b1f4c576ea7

c-kruse avatar Apr 11 '25 15:04 c-kruse

@c-kruse, @ted-ross @ajssmith After the meeting last week and following what Christian and Ted said, I ran a three node setup and using skupper-example-ftp to test traffic and get logs.

public ------private --------private1 |____________________________|

ftp-client is on public, ftp-server on private, traffic initially goes between public-private

you can see that the "Hello!" message is sent from public to private

$ echo "Hello!" | kubectl run ftp-client --stdin --rm --image=docker.io/curlimages/curl --restart=Never -- -s -T - ftp://example:example@ftp-server/greeting
If you don't see a command prompt, try pressing enter.
pod "ftp-client" deleted

$ kubectl run ftp-client --attach --rm --image=docker.io/curlimages/curl --restart=Never -- -s ftp://example:example@ftp-server/greeting
Hello!
pod "ftp-client" deleted

On public you can see the two links:

$ skupper link status
There are no outgoing link resources in the namespace

Incoming Links from remote sites: 
NAME						STATUS	REMOTE SITE
public-7336aae6-fb90-4c13-b230-8072e7f504c4	Ready	private1
public-7336aae6-fb90-4c13-b230-8072e7f504c4	Ready	private

On private I remove the link between public and private:

kubectl delete link public-7336aae6-fb90-4c13-b230-8072e7f504c4

public skupper-router logs sees link has gone down and traffic is rerouted

2025-04-25 13:56:26.200972 +0000 ROUTER_LS (info) Link to Neighbor Router Lost - link_tag=0
2025-04-25 13:56:26.201609 +0000 ROUTER_CORE (info) [C34] Connection Closed
2025-04-25 13:56:26.203206 +0000 ROUTER_CORE (info) [C35] Connection Closed
2025-04-25 13:56:26.203240 +0000 ROUTER_CORE (info) [C33] Connection Closed
2025-04-25 13:56:27.199217 +0000 ROUTER_LS (info) Computed next hops: {'private1-skupper-router-777d5f8b66-wldjk': 'private1-skupper-router-777d5f8b66-wldjk', 'private-skupper-router-5fd64b7555-fpkjc': 'private1-skupper-router-777d5f8b66-wldjk'}
2025-04-25 13:56:27.199434 +0000 ROUTER_LS (info) Computed costs: {'private1-skupper-router-777d5f8b66-wldjk': 1, 'private-skupper-router-5fd64b7555-fpkjc': 2}
2025-04-25 13:56:27.199485 +0000 ROUTER_LS (info) Computed valid origins: {'private1-skupper-router-777d5f8b66-wldjk': [], 'private-skupper-router-5fd64b7555-fpkjc': []}
2025-04-25 13:56:27.199506 +0000 ROUTER_LS (info) Computed radius: 2
2025-04-25 13:56:31.210424 +0000 ROUTER_LS (info) Computed next hops: {'private1-skupper-router-777d5f8b66-wldjk': 'private1-skupper-router-777d5f8b66-wldjk', 'private-skupper-router-5fd64b7555-fpkjc': 'private1-skupper-router-777d5f8b66-wldjk'}
2025-04-25 13:56:31.210852 +0000 ROUTER_LS (info) Computed costs: {'private1-skupper-router-777d5f8b66-wldjk': 1, 'private-skupper-router-5fd64b7555-fpkjc': 2}
2025-04-25 13:56:31.210901 +0000 ROUTER_LS (info) Computed valid origins: {'private1-skupper-router-777d5f8b66-wldjk': [], 'private-skupper-router-5fd64b7555-fpkjc': []}
2025-04-25 13:56:31.210927 +0000 ROUTER_LS (info) Computed radius: 2
2025-04-25 13:56:46.234359 +0000 FLOW_LOG (info) BIFLOW_TPORT [bpn9x:29] BEGIN END parent=bpn9x:4 sourceHost=10.244.0.90 sourcePort=34742 octets=7 trace=0/public-skupper-router-cbd664ccc-bpn9x|0/private1-skupper-router-777d5f8b66-wldjk octetRate=0 octetsReverse=0 octetRateReverse=0 connector=fpkjc:1 proxyHost=10.244.0.83 proxyPort=41956
2025-04-25 13:56:46.234392 +0000 FLOW_LOG (info) BIFLOW_TPORT [bpn9x:28] BEGIN END parent=bpn9x:3 sourceHost=10.244.0.90 sourcePort=43792 octets=68 trace=0/public-skupper-router-cbd664ccc-bpn9x|0/private1-skupper-router-777d5f8b66-wldjk octetRate=6 octetsReverse=226 octetRateReverse=18 connector=fpkjc:2 proxyHost=10.244.0.83 proxyPort=44082
2025-04-25 13:56:55.707579 +0000 FLOW_LOG (info) BIFLOW_TPORT [bpn9x:31] BEGIN END parent=bpn9x:4 sourceHost=10.244.0.91 sourcePort=45494 octets=0 trace=0/public-skupper-router-cbd664ccc-bpn9x|0/private1-skupper-router-777d5f8b66-wldjk octetsReverse=7 connector=fpkjc:1 proxyHost=10.244.0.83 proxyPort=43646
2025-04-25 13:56:55.707642 +0000 FLOW_LOG (info) BIFLOW_TPORT [bpn9x:30] BEGIN END parent=bpn9x:3 sourceHost=10.244.0.91 sourcePort=44360 octets=83 trace=0/public-skupper-router-cbd664ccc-bpn9x|0/private1-skupper-router-777d5f8b66-wldjk octetsReverse=276 connector=fpkjc:2 proxyHost=10.244.0.83 proxyPort=45182

private skupper-router logs link is removed:

2025-04-25 13:56:26.193510 +0000 AGENT (info) Deleted sslProfile public-7336aae6-fb90-4c13-b230-8072e7f504c4-profile
2025-04-25 13:56:26.200404 +0000 CONN_MGR (info) Deleted  Connector: 10.107.205.248:55671 proto=any, role=inter-router, sslProfile=public-7336aae6-fb90-4c13-b230-8072e7f504c4-profile
2025-04-25 13:56:26.201446 +0000 ROUTER_LS (info) Link to Neighbor Router Lost - link_tag=0
2025-04-25 13:56:26.201517 +0000 ROUTER_CORE (info) [C19] Connection Closed
2025-04-25 13:56:26.202084 +0000 ROUTER_CORE (info) [C18] Connection Closed
2025-04-25 13:56:26.202113 +0000 ROUTER_CORE (info) [C20] Connection Closed
2025-04-25 13:56:26.573695 +0000 ROUTER_LS (info) Computed next hops: {'private1-skupper-router-777d5f8b66-wldjk': 'private1-skupper-router-777d5f8b66-wldjk', 'public-skupper-router-cbd664ccc-bpn9x': 'private1-skupper-router-777d5f8b66-wldjk'}
2025-04-25 13:56:26.573784 +0000 ROUTER_LS (info) Computed costs: {'private1-skupper-router-777d5f8b66-wldjk': 1, 'public-skupper-router-cbd664ccc-bpn9x': 2}
2025-04-25 13:56:26.573816 +0000 ROUTER_LS (info) Computed valid origins: {'private1-skupper-router-777d5f8b66-wldjk': [], 'public-skupper-router-cbd664ccc-bpn9x': []}
2025-04-25 13:56:26.573876 +0000 ROUTER_LS (info) Computed radius: 2
2025-04-25 13:56:27.037298 +0000 FLOW_LOG (info) LINK [fpkjc:9] BEGIN END parent=fpkjc:0 peer=bpn9x:1 destHost=10.107.205.248 protocol=amqp destPort=55671 octets=75183 result=unknown reason= name=public-7336aae6-fb90-4c13-b230-8072e7f504c4 linkCost=1 octetRate=418 operStatus=down role=inter-router upTimeStamp=1745589259590005 downTimeStamp=1745589386202792 downCount=1 octetsReverse=117347 octetRateReverse=566
2025-04-25 13:56:31.583929 +0000 ROUTER_LS (info) Computed next hops: {'private1-skupper-router-777d5f8b66-wldjk': 'private1-skupper-router-777d5f8b66-wldjk', 'public-skupper-router-cbd664ccc-bpn9x': 'private1-skupper-router-777d5f8b66-wldjk'}
2025-04-25 13:56:31.584595 +0000 ROUTER_LS (info) Computed costs: {'private1-skupper-router-777d5f8b66-wldjk': 1, 'public-skupper-router-cbd664ccc-bpn9x': 2}
2025-04-25 13:56:31.584633 +0000 ROUTER_LS (info) Computed valid origins: {'private1-skupper-router-777d5f8b66-wldjk': [], 'public-skupper-router-cbd664ccc-bpn9x': []}
2025-04-25 13:56:31.584673 +0000 ROUTER_LS (info) Computed radius: 2

even after the link has gone down from public to private it still shows status as up:

skupper link status -n public
There are no outgoing link resources in the namespace

Incoming Links from remote sites: 
NAME						STATUS	REMOTE SITE
public-7336aae6-fb90-4c13-b230-8072e7f504c4	Ready	private1
public-7336aae6-fb90-4c13-b230-8072e7f504c4	Ready	private

The new link status command get link state from the network portion of the site resource, but this is never updated with the link change.

logs from skupper-controller on namespace private we are disconnecting the connector, but there is no notification about link for public namespace

time=2025-04-25T13:54:59.768Z level=INFO msg="Redemption of access token private/private-3baf8a81-bf8d-4fe6-8222-1e8fe47a2457 succeeded"
time=2025-04-25T13:54:59.780Z level=INFO msg="Connecting site using token" component=kube.site.site namespace=private1 token=private-3baf8a81-bf8d-4fe6-8222-1e8fe47a2457
time=2025-04-25T13:56:26.173Z level=INFO msg="Disconnecting connector from site" component=kube.site.site name=public-7336aae6-fb90-4c13-b230-8072e7f504c4 namespace=private
time=2025-04-25T13:57:51.388Z level=INFO msg="Grant server tls credentials updated"

ftp still works as it is rerouted around down link between public-private1-private

$ echo "Hello!" | kubectl run ftp-client --stdin --rm --image=docker.io/curlimages/curl --restart=Never -- -s -T - ftp://example:example@ftp-server/greeting
If you don't see a command prompt, try pressing enter.
pod "ftp-client" deleted

$ kubectl run ftp-client --attach --rm --image=docker.io/curlimages/curl --restart=Never -- -s ftp://example:example@ftp-server/greeting
Hello!
pod "ftp-client" deleted

So how do we want to get the correct status here? Adding the incoming links to the show link status commands makes the problem very obvious.

lynnemorrison avatar Apr 25 '25 14:04 lynnemorrison