postgres-operator
postgres-operator copied to clipboard
iam for sa documentation
Per #1124 you no longer need to use kube2iam, however, the docs say
kube_iam_role AWS IAM role to supply in the
iam.amazonaws.com/role
annotation of Postgres pods. Only used when combined with kube2iam project on AWS. The default is empty.
I am new to this project and unsure how to implement this. It would be very useful if the docs showed how to use iam for sa
Just went through this struggle.
Suffice it to say that the biggest missing piece for me was that you must opt-in to using wal-g
since the default wal-e
backup does not have sts support and will fail with access denied errors even if your service account is provisioned correctly. This can be done via any number of pod environment configurations described here, though I found the actual values you need to set in this article only after cluing into the fact that this might even be necessary from this issue comment.
-
USE_WALG_BACKUP
-
USE_WALG_RESTORE
-
CLONE_USE_WALG_RESTORE
Happy to give more details and help spruce up the docs in this regard!
Hey @rusty-jules. Have you managed to configure zalando postgres operator to using service account with binded AWS IAM policy for s3 access? Could you share how you did that ?
@tolikkostin Sure thing!
- In
OperatorConfiguration
underconfiguration -> kubernetes -> pod_service_account_definition
add the required eks iam role annotation to get an injected service account token (you can hardcode your account id/iam role name)
pod_service_account_definition: |
metadata:
name: postgres-pod
annotations:
# NOTE: we are using flux kustomization substitution here to inject the account_id value at runtime
eks.amazonaws.com/role-arn: arn:aws:iam::${aws_account_id}:role/${iam_role_name}
kind: ServiceAccount
apiVersion: v1
- In
OperatorConfiguration
underconfiguration -> kubernetes -> pod_environment_configmap
reference a configmap that you apply separately, in the form<namespace>/<configmap-name>
. The configmap should contain keys with the environment variables I mentioned in the earlier comment.
# operatorconfiguration.yaml
pod_environment_configmap: postgres-operator/postgres-env
# postgres-env-configmap.yaml
metadata:
name: postgres-env
namespace: postgres-operator
data:
USE_WALG_BACKUP: "true"
USE_WALG_RESTORE: "true"
CLONE_USE_WALG_RESTORE: "true"
kind: ConfigMap
apiVersion: v1
- In
OperatorConfiguration
, underconfiguration -> aws_or_gcp
set your s3 bucket values. Note that the""
values were actually left so.
aws_or_gcp:
aws_region: ${aws-region}
enable_ebs_gp3_migration: false
enable_ebs_gp3_migration_max_size: 1000
gcp_credentials: ""
kube_iam_role: "" # we are not using this field with this method, as this relies on a third-party project
log_s3_bucket: ""
wal_az_storage_account: ""
wal_gs_bucket: ""
wal_s3_bucket: ${my-backup-bucket} # this should just be the bucket name, no "s3://" prefix
Optionally set
my-backup-bucket
as the value ofconfiguration.logical_backup.logical_backup_s3_bucket
andlogical_backup_provider
to "s3"
- Create an IAM Role with permissions to access your
my-backup-bucket
S3 bucket and an assume role policy that allows the aboveServiceAccount
to assume it. Note that the s3 permissions must be on the iam role, not an s3 bucket policy. The assume role policy may look something like this
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::${aws_account_id}:oidc-provider/oidc.eks.us-west-1.amazonaws.com/id/${eks_id}"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"oidc.eks.us-west-1.amazonaws.com/id/${eks_id}:aud": "sts.amazonaws.com"
},
"StringLike": {
"oidc.eks.us-west-1.amazonaws.com/id/${eks_id}:sub": "system:serviceaccount:*:postgres-pod"
}
}
}
]
}
I believe that was it! wal-g
should find the eks injected serviceaccount token automatically (using environment variables that EKS injects when it sees the serviceaccount annotation) and use it when it attempts to write to the bucket. I remember jumping onto the postgres pod directly and running manual backups to debug this, there's a script to do this in the spilo container at /scripts/postgres_backup.sh
which will let you know if it can't access the bucket (found out about this from #2067, the script must be run with the envdir
command). Also note that postgres-operator
does not automatically pick up changes to the OperatorConfiguration
CRD if that's what you're using - you'll need to kick the deployment when you update it.
Hope this helps.
@rusty-jules many thanks for your well-detailed guide! Awesome!
@rusty-jules Have you tried to create a new cluster (restore) by cloning that backup?
@oleksiytsyban sure have! I found that I needed a few more fields in the postgres-env
ConfigMap
, namely all of the fields that are automatically set by EKS via serviceaccount annotation, but with the CLONE_*
prefix:
data:
CLONE_AWS_REGION: "${aws_region}"
CLONE_AWS_ROLE_ARN: "${iam_role_arn}"
CLONE_AWS_WEB_IDENTITY_TOKEN_FILE: "/var/run/secrets/eks.amazonaws.com/serviceaccount/token"
CLONE_AWS_STS_REGIONAL_ENDPOINTS: regional # this one depends on the settings of your account/vpc
# this one was necessary on EC2 instances that use IMDSv2 metadata api only, since spilo still uses v1
SPILO_PROVIDER: aws
Then set the clone uid and timestamp in cloned cluster custom resource.
apiVersion: acid.zalan.do/v1
kind: postgresql
spec:
clone:
uid: "${cloned_cluster_uid}" # get this from the k8s custom resource of the previous cluster, or the s3 backup key
timestamp: "2024-02-05T08:00:00+00:00" # the timezone is required in the format of ±00:00 (UTC)
@rusty-jules Thank you for confirming. I did something similar here: https://github.com/zalando/postgres-operator/issues/2067#issuecomment-1664786251 And created an issue about that: https://github.com/zalando/spilo/issues/897
I was hoping maybe the issue has been fixed and I can get rid of those additional variables. Not yet.