teleport
teleport copied to clipboard
GCP SQL PostgreSQL backend
changelog: added GCP Cloud SQL for PostgreSQL backend support
Doc draft WIP: https://docs-6pnqngs1d-goteleport.vercel.app/docs/reference/backends/#google-cloud-iam-authentication
Testing checklist:
- [x] change feed as a different database user (token creator)
- [x] events
- [x] credentials using service account impersonation through Workload Identity Federation with AWS
- [x] Private IP for Cloud SQL
- [x] Compatibility tests in
pgbk
for GCP SQL - [ ] load test (TBD v16 release testing)
Testing setup: see above doc preview
Is there a reason why we're not using cloud.google.com/go/cloudsqlconn and cloud.google.com/go/alloydbconn?
Part of not-using-the-connector-lib is to support the case to authenticate as a different database user for change feed.
When using a Cloud SQL connector with automatic IAM database authentication, the IAM account that you use to start the connector must be the same account that authenticates the database.
We can hack the connector somehow to authenticate as another database user. We probably gonna end up with the same amount of code of what I currently have (if not more).
Also, personally find it harder to use the project:region:instance
than just the IP. The benefit of using the connector is the connector will make API calls to resolve the IP automatically. But you are supposed to programmatically specify how it chooses the IP like cloudsqlconn.WithPrivateIP()
, which can be a hassle if we want to provide that kind of option through our config.
What do you think? I can give it a shot using the connector if you feel it's worth pursuing.
Alternatively, we can support both project:region:instance
and the IP as host. If project:region:instance
is specified, we use the connector lib and will not support authenticating as a different database user. If IP is specified, it goes to current impl. I rather not do this though. I would prefer just support IP in first release, and implement the resolver part later using custom logic if a customer asks for it.
Converted to draft to try out the connector lib
@espadolini updated the PR now to use the connector lib. sample usage see preview of doc draft https://docs-qhi80m1z1-goteleport.vercel.app/docs/reference/backends/#google-cloud-iam-authentication.
Opening for review again.
Load Test:
Setup
- Two e2-standard-8 (8vCPU, 32GB mem) VMs on GCP
- Cloud SQL: DB version: PostgreSQL 15.5, vCPUs: 8, Memory: 32 GB, SSD storage: 250 GB
teleport.yaml:
version: v3
teleport:
connection_limits:
max_connections: 65000
max_users: 1000
log:
severity: WARN
format:
output: text
auth_server: auth.gcploadtest.dev.aws.stevexin.me
storage:
type: postgresql
auth_mode: gcp-cloudsql
gcp_connection_name: teleport-dev-320620:us-central1:steve-postgres-loadtest
gcp_ip_type: "private"
conn_string: postgresql://[email protected]@/teleport_backend
audit_events_uri:
- "postgresql://[email protected]@/teleport_audit#auth_mode=gcp-cloudsql&gcp_connection_name=teleport-dev-320620:us-central1:steve-postgres-loadtest&gcp_ip_type=private"
audit_sessions_uri: "file:///var/lib/teleport/logs/"
The run
Run tctl loadtest
for 30 minutes n both servers, no errors:
tctl loadtest node-heartbeats --count=15000 --ttl=2m --interval=1m --labels=2 --concurrency=32
[i] Setting up node hb load generation. count=15000, churn=0, labels=2, interval=1m0s, ttl=2m0s, concurrency=32
[i] Estimated serialized node size: 606 (bytes)
[i] Queued heartbeat batch for emission. generation=1, errors=0
[i] Queued heartbeat batch for emission. generation=2, errors=0
...
VM memory usage from top
:
VM cpu and network usage:
Cloud SQL metrics
Manual test during load test
- Login/logout through web ui
- SSH session through web ui
- Add/Delete user through web ui
- Run
tctl nodes
andtctl inventory
commands
No slowness or error/warning logs observed during the test.
I will resolve the lint/merge conflict and cherry-pick the logger change then merge this. Thanks everyone!
FYI go.mod change updated google.golang.org/protobuf v1.34.1
and have to run make grpc
. Only version diffs after run:
-// protoc-gen-go v1.34.0
+// protoc-gen-go v1.34.1
https://github.com/gravitational/teleport/pull/41392/commits/eebc9629c16fe337ed7ad350e3ca9af592f9908c
--- update It happened that master is also updated with v1.34.1. I've merged with master again....