[psql][aws][s3] wal-g AWS S3 backups broken while using AWS IAM IRSA
Database name
we use postgresql but it affects all databases backups stored on AWS S3 using IAM IRSA
Issue description
Describe your problem
https://github.com/wal-g/wal-g/pull/1377 breaks support for AWS IAM IRSA authentication: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-role.html#cli-configure-role-oidc
Please provide steps to reproduce
Below configuration works with https://github.com/wal-g/wal-g/releases/tag/v2.0.1.
With https://github.com/wal-g/wal-g/releases/tag/v3.0.3 it requires AWS_ROLE_SESSION_NAME.
Specifying AWS_ROLE_SESSION_NAME should be optional and it is not needed for AWS IAM IRSA.
Please add config and wal-g stdout/stderr logs for debug purpose
AWS_REGION: eu-central-1
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_ROLE_ARN: arn:aws:iam::111111111111:role/postgres-backup-role
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
BACKUP_NUM_TO_RETAIN: "10"
BACKUP_SCHEDULE: 00 */12 * * *
CLONE_AWS_REGION: eu-central-1
CLONE_AWS_ROLE_ARN: arn:aws:iam::111111111111:role/postgres-backup-role
CLONE_AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
CLONE_USE_WALG_RESTORE: "true"
LOG_S3_BUCKET: postgres-backup
WAL_S3_BUCKET: postgres-backup
USE_WALG_BACKUP: "true"
USE_WALG_RESTORE: "true"
If you can, provide logs
root@temporal-postgresql-0:/home/postgres# wal-g --version
wal-g version v3.0.3 3f88f3c 2024.08.08_17:53:40 PostgreSQL
root@temporal-postgresql-0:/home/postgres# wal-g-v2.0.1 --version
wal-g version v2.0.1 b7d53dd 2022.08.25_09:34:20 PostgreSQL
root@temporal-postgresql-0:/home/postgres# export AWS_ROLE_SESSION_NAME=system:serviceaccount:automation-service:postgres-pod-sa
root@temporal-postgresql-0:/home/postgres# envdir /run/etc/wal-e.d/env/ wal-g backup-list
ERROR: 2024/10/11 15:52:43.441470 configure primary storage: configure storage with prefix "s3://postgres-backup/spilo/temporal-postgresql/12075954-67d5-4764-a7ea-df5925ca27fc/wal/15": create S3 storage: create new AWS session: configure session: assume role by ARN: WebIdentityErr: failed to retrieve credentials
caused by: ValidationError: 1 validation error detected: Value 'system:serviceaccount:automation-service:postgres-pod-sa' at 'roleSessionName' failed to satisfy constraint: Member must satisfy regular expression pattern: [\w+=,.@-]*
status code: 400, request id: ce6c3656-6228-4d5d-94a4-9ea1670d1cf6
root@temporal-postgresql-0:/home/postgres# unset AWS_ROLE_SESSION_NAME
root@temporal-postgresql-0:/home/postgres# envdir /run/etc/wal-e.d/env/ wal-g-v2.0.1 backup-list
name modified wal_segment_backup_start
base_000000010000000000000004 2024-09-20T11:34:19Z 000000010000000000000004
base_000000010000000000000006 2024-09-20T12:00:03Z 000000010000000000000006
base_00000001000000000000001F 2024-09-21T00:00:03Z 00000001000000000000001F
base_000000010000000000000038 2024-09-21T12:00:03Z 000000010000000000000038
base_000000010000000000000051 2024-09-22T00:00:03Z 000000010000000000000051
base_00000001000000000000006A 2024-09-22T12:00:03Z 00000001000000000000006A
base_000000010000000000000083 2024-09-23T00:00:03Z 000000010000000000000083
base_00000001000000000000009C 2024-09-23T12:00:03Z 00000001000000000000009C
Related issues: https://github.com/zalando/postgres-operator/issues/2747
Thanks, this seems like a bug. Can you please provide a fix or do I need to find someone on this?
I will try to make this working again using aws.Config together with the possibility to assume role for @Qwiz.
i remember, i'm working on it ;)
Thank you! Ping me if I can be of any help.
any news? same issue
i won't finish it this year I'm afraid, until fixed please use older version where it still works: https://github.com/wal-g/wal-g/releases/tag/v2.0.1 working wal-g version is available in below spilo image version: https://github.com/zalando/spilo/releases/tag/3.2-p3
@moss2k13 thanks for your work on this!
Is there any way we can expedite the release for this fix?
@debebantur @ostinru what do you think about cutting new release?
No blockers from my side. Probably we can make minor release once a month =)
Same issue Operator image: ghcr.io/zalando/postgres-operator:v1.14.0 Spilo image: ghcr.io/zalando/spilo-17:4.0-p2