aaw
aaw copied to clipboard
[Epic] Networking review
This epic is for tracking any overhaul we should do to simplify cluster networking. Add suggestions freely to the Proposed list (in plain text unless an issue already exists). Items that make it through architectural review will be turned into issues and moved to Approved.
Approved
- [ ] Nothing yet
Proposed
- StatCan/aaw-argocd-manifests#56
- Create documentation with best practices and examples for implementing networking for new services
I was also going to propose changes to ClusterRbacConfig, but I understand this would would be replaced with an istio upgrade and can be reevaluated if necessary then...?
Just some loose thoughts:
| DNS (and matching certificate) | Internet reachable | Gateway | Scope | Example Service |
|---|---|---|---|---|
| *.aaw.cloud.statcan.ca | Yes | public gateway | accessible to all | Kubeflow |
| *.internal.aaw.cloud.statcan.ca | No | internal gateway | accessible within statcan networks; NetB & Cloud VMs | ArgoCD |
| *.cluster.aaw.cloud.statcan.ca | No | cluster gateway | accessible from cluster network (nodes & pods) | monitoring-elasticsearch |
| *.protected-b.aaw.cloud.statcan.ca | No | protected-b gateway | CAE & Protected-B Notebooks | vetting application |
Each gateway has an Istio ingress gateway and corresponding service with an internal load balancer. Only the public gateway gets publicly routed, all the other services are private Azure DNS entries. The Scope on each gateway is enforced by Azure firewall rules on the source IP address, not network policies.
NOTE: I don't understand the authenticated gateway well, so that's another one to factor in here.
TODO: Network Policies?
We'd want to define, especially for the internal and cluster gateways, whether or not pods in the cluster can talk to those endpoints.
Exercises: Where should KServe & Seldon go?
It would be beneficial to consider the option to enable accessibility of a Seldon deployment by all users on NetB incl. AAW for specific use cases. Related to #1025
@brendangadd @Collinbrown95 Chapter 9 in this book explains why we have this trafficPolicy: local business. Might be worth a skim
https://www.tigera.io/lp/kubernetes-security-and-observability-ebook/