network-orchestration-for-aws-transit-gateway
network-orchestration-for-aws-transit-gateway copied to clipboard
STNO Portal Shows only 1 CIDR
Feature request?
We have been using STNO for some time now, its awesome, but only now we detected this behaviour.
STNO does all the glue from Spoke to Hub Accounts, most importantly:
- Create an Attachment between VPC-Spoke <> TGW-Hub
- Associate All Tagged SubNets to this Attachment
- HUB-Routing Table - add 1x Association
- HUB-Routing Table - add 1+ Propagations
- Add 1x CIDR Route on Spoke Account Routing Tables per SubNet, pointing to TGW-Hub
Behind the scenes CIDRs that are configured on the VPC will get propagated to the routing tables where the Attachment is set to be propagate into. *1
Within the STNO Portal we only see one CIDR to be approved (probably only the first VPC CIDR).
In our LandingZone environment we have been experiencing these symptoms:
- An Approver on the STNO Portal will not see all CIDRs that will get propagated into the Routing Tables as soon as he approves (it is important to validate the CIDRs in a LandingZone, because it needs to be unique within the LZ and we do have an IPAM to ensure this, as most companies do).
- If a Customer manually adds CIDRs to its VPC, a new CIDR will get pushed into the LZ without going thru STNO. *1
- With 1. and 2. it has become obvious that STNO db doesn't contain a single source of truth with all CIDRs.
- If by error, TAGs are deleted on SubNets / VPCs there is no Approval process controlling this, and all "glue" objects are deleted without Hub-Approval control or rollback.
- There is a downtime of 5 min + 5 mins to apply a simple Change TAGs procedure. We need to Delete TAGs first, wait5min for STNO to finish and then ADD TAGs again and wait5min again for STNO to finish. We found a way to manually configure the desired state on the Hub and then retag, this would reduce to 5sec + 5sec simply to reassociate the attachment to a new routing table.
- We detected issues when approving only a VPC on STNO Portal without the SubNets tagged. No Attachment is created if we first approve a VPC without tagging the SubNets. Then even the SubNets are tagged the attachment is not created, we need to restart the process.
- In a Spoke-Hub environment, usually Spoke and Hub are different accounts owned by different teams. Ou Hub Team is unable to see VPC configurations related to SGs/NACLs and Routing Tables, this limits and difficults when troubleshooting issues.
*1: This is inline with the public documentation here: https://docs.aws.amazon.com/vpc/latest/tgw/how-transit-gateways-work.html#tgw-route-propagation-overview , quoting from the same documentation: “For a VPC attachment, the CIDR blocks of the VPC are propagated to the transit gateway route table.” (Notice the “s” in CIDR blocks) .
Suggestions:
- To address items 1.2.3 probably it would be best to monitor VPC-CIDRs on Spoke accounts, to add/remove CIDRs into STNO db.
- To address item 4 we wouldn't go as far as limiting a Spoke to disable the delete of the attachment, but instead it would make sense to make the attachment be owned by HUB.
- To address item 5 a change tags procedure could exist, or even a roll-back.
- To address item 6 Group VPC and SubNets in STNO Portal including all VPC-CIDRs, allowing a one time first approval. Approving more SubNets to this group should be possible after first creation of the group.
- To address item 7 it would be a good idea to allow HUB to share ownership of the VPCs, SubNets, SecurityGrps and NACLs, and maybe other related objects like prefix-lists. Probably HUB Team would need to choose between read-only or read-write for each object, and Spoke Team would need to approve the chosen option.
We thank you for your thoughts, feed-back or anythings onto helping us is appreciated very much.
Thanks and keep up the good work guys...