modernisation-platform
modernisation-platform copied to clipboard
Spike - could we do a bastion per VPC?
User Story
1 Bastion per Business Unit VPC - so 1 for LAA, 1 for HMPPS, etc.
Currently each account is creating a bastion, that's potentially a bastion per application environment.
Number of environments - development, test, preproduction, production = 4 Number of applications - potentially hundreds, let's say 100 Number of bastions - 400 +
This spike is to look at if we could securely switch to using a bastion per VPC, which would reduce the number of bastions needed massively. For example, how do we allow NOMIS folks to only access NOMIS things on the Bastion.
Number of environments - 4 Number of business units - 8 Number of bastions - 32 max
Value
Reduce costs Reduced energy consumption No need for application teams add the bastion module to their code
Questions / Assumptions
Definition of done
- [ ] spike complete and presented to the team
- [ ] another team member has reviewed
- [ ] tests are green
Reference
Code changes are in the branch features/bastion-per-vpc
under the modernisation-platform repo: https://github.com/ministryofjustice/modernisation-platform/compare/features/bastion-per-vpc?expand=1
I have deployed a bastion host in core-vpc-sandbox SSO account. And there is an existing bastion host in sprinkler-development SSO account. They both sit under the subnet-043052146fbc41df3, garden-sandbox-general-private-eu-west-2a, IPv4 CIDR 10.231.0.0/24.
However, while I am able to access the sprinkler DB management server from the sprinkler bastion, I cannot do so from the core-vpc-sandbox bastion. Also, neither of the two bastions are able to ping each other. My understanding is that the subnets are RAM shared from core-vpc-sandbox account to the sprinkler-development member account. But it seems that the bastions would have to be RAM shared also.
For example, from sprinkler bastion I am able to connect to port 3389 on sprinkler-db-mgmt-server (Connection refused=port is open):
[georgef@ip-10-231-0-211 ~]$ curl --connect-timeout "3" 10.231.0.190:3389
curl: (7) Failed to connect to 10.231.0.190 port 3389 after 0 ms: Connection refused
[georgef@ip-10-231-0-211 ~]$ curl --connect-timeout "3" 10.231.0.190:80
curl: (28) Connection timeout after 3001 ms
From core-vpc-sandbox bastion trying to connect to sprinkler-db-mgmt-server
[georgef@ip-10-231-0-142 ~]$ curl --connect-timeout "3" 10.231.0.190:3389
curl: (28) Connection timeout after 3001 ms
[georgef@ip-10-231-0-142 ~]$ curl --connect-timeout "3" 10.231.0.190:80
curl: (28) Connection timeout after 3001 ms
Under the core-vpc-sandbox SSO account, we should be able to deploy a bastion per VPC: garden-sandbox and house-sandbox.
The SSO account core-vpc-development has the VPC hmpps-development. Under hmpps-development, there are performance-hub-development and equip-development. The EC2 instances in equip-development and performance-hub-development reside in the same VPC and subnet: vpc-01d7a2da8f9f1dfec (hmpps-development), subnet-04af8bd9dbbce3310 (hmpps-development-general-private-eu-west-2c), IPv4 CIDR 10.26.26.0/24
We should be able to RAM share bastion deployments in module "resource-share"
https://github.com/ministryofjustice/modernisation-platform/blob/main/terraform/environments/core-vpc/vpc.tf#L186 I guess by definition, there is no cross over from one VPC to another, for example, hmpps-development cannot access hmcts-development and vice versa. However, since the resources are provisioned in the same subnet (say hmpps-development-general-private-eu-west-2c) for a specific (say development) environment in a specific (say hmpps) business unit, my guess is that if we RAM share bastions to the member accounts, then users will be able to access resources via bastion in another member account. I am not sure if this would be a problem, especially for non-production environments (development, test, preproduction).
I think we should monitor the number of bastions running in the platform, and implement this once the number is greater than 32. Otherwise we are using more resources than needed.
This is not currently an issue