wal-g icon indicating copy to clipboard operation
wal-g copied to clipboard

[psql][aws][s3] wal-g AWS S3 backups broken while using AWS IAM IRSA

Open moss2k13 opened this issue 1 year ago • 2 comments

Database name

we use postgresql but it affects all databases backups stored on AWS S3 using IAM IRSA

Issue description

Describe your problem

https://github.com/wal-g/wal-g/pull/1377 breaks support for AWS IAM IRSA authentication: https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-role.html#cli-configure-role-oidc

Please provide steps to reproduce

Below configuration works with https://github.com/wal-g/wal-g/releases/tag/v2.0.1.

With https://github.com/wal-g/wal-g/releases/tag/v3.0.3 it requires AWS_ROLE_SESSION_NAME.

Specifying AWS_ROLE_SESSION_NAME should be optional and it is not needed for AWS IAM IRSA.

Please add config and wal-g stdout/stderr logs for debug purpose

AWS_REGION: eu-central-1
AWS_STS_REGIONAL_ENDPOINTS: regional
AWS_ROLE_ARN: arn:aws:iam::111111111111:role/postgres-backup-role
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
BACKUP_NUM_TO_RETAIN: "10"
BACKUP_SCHEDULE: 00 */12 * * *
CLONE_AWS_REGION: eu-central-1
CLONE_AWS_ROLE_ARN: arn:aws:iam::111111111111:role/postgres-backup-role
CLONE_AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
CLONE_USE_WALG_RESTORE: "true"
LOG_S3_BUCKET: postgres-backup
WAL_S3_BUCKET: postgres-backup
USE_WALG_BACKUP: "true"
USE_WALG_RESTORE: "true"
If you can, provide logs

root@temporal-postgresql-0:/home/postgres# wal-g --version
wal-g version v3.0.3	3f88f3c	2024.08.08_17:53:40	PostgreSQL


root@temporal-postgresql-0:/home/postgres# wal-g-v2.0.1 --version
wal-g version v2.0.1	b7d53dd	2022.08.25_09:34:20	PostgreSQL


root@temporal-postgresql-0:/home/postgres# export AWS_ROLE_SESSION_NAME=system:serviceaccount:automation-service:postgres-pod-sa


root@temporal-postgresql-0:/home/postgres# envdir /run/etc/wal-e.d/env/ wal-g backup-list

ERROR: 2024/10/11 15:52:43.441470 configure primary storage: configure storage with prefix "s3://postgres-backup/spilo/temporal-postgresql/12075954-67d5-4764-a7ea-df5925ca27fc/wal/15": create S3 storage: create new AWS session: configure session: assume role by ARN: WebIdentityErr: failed to retrieve credentials
caused by: ValidationError: 1 validation error detected: Value 'system:serviceaccount:automation-service:postgres-pod-sa' at 'roleSessionName' failed to satisfy constraint: Member must satisfy regular expression pattern: [\w+=,.@-]*
	status code: 400, request id: ce6c3656-6228-4d5d-94a4-9ea1670d1cf6


root@temporal-postgresql-0:/home/postgres# unset AWS_ROLE_SESSION_NAME


root@temporal-postgresql-0:/home/postgres# envdir /run/etc/wal-e.d/env/ wal-g-v2.0.1 backup-list
name                          modified             wal_segment_backup_start
base_000000010000000000000004 2024-09-20T11:34:19Z 000000010000000000000004
base_000000010000000000000006 2024-09-20T12:00:03Z 000000010000000000000006
base_00000001000000000000001F 2024-09-21T00:00:03Z 00000001000000000000001F
base_000000010000000000000038 2024-09-21T12:00:03Z 000000010000000000000038
base_000000010000000000000051 2024-09-22T00:00:03Z 000000010000000000000051
base_00000001000000000000006A 2024-09-22T12:00:03Z 00000001000000000000006A
base_000000010000000000000083 2024-09-23T00:00:03Z 000000010000000000000083
base_00000001000000000000009C 2024-09-23T12:00:03Z 00000001000000000000009C

Related issues: https://github.com/zalando/postgres-operator/issues/2747

moss2k13 avatar Oct 16 '24 08:10 moss2k13

Thanks, this seems like a bug. Can you please provide a fix or do I need to find someone on this?

x4m avatar Oct 17 '24 18:10 x4m

I will try to make this working again using aws.Config together with the possibility to assume role for @Qwiz.

moss2k13 avatar Oct 20 '24 17:10 moss2k13

i remember, i'm working on it ;)

moss2k13 avatar Nov 13 '24 14:11 moss2k13

Thank you! Ping me if I can be of any help.

x4m avatar Nov 13 '24 14:11 x4m

any news? same issue

ihor-kokhan-onereach avatar Dec 10 '24 11:12 ihor-kokhan-onereach

i won't finish it this year I'm afraid, until fixed please use older version where it still works: https://github.com/wal-g/wal-g/releases/tag/v2.0.1 working wal-g version is available in below spilo image version: https://github.com/zalando/spilo/releases/tag/3.2-p3

moss2k13 avatar Dec 10 '24 11:12 moss2k13

@moss2k13 thanks for your work on this!

x4m avatar Dec 10 '24 11:12 x4m

Is there any way we can expedite the release for this fix?

sandeeppanchal-bitgo avatar Mar 05 '25 08:03 sandeeppanchal-bitgo

@debebantur @ostinru what do you think about cutting new release?

x4m avatar Mar 05 '25 09:03 x4m

No blockers from my side. Probably we can make minor release once a month =)

ostinru avatar Mar 19 '25 11:03 ostinru

Same issue Operator image: ghcr.io/zalando/postgres-operator:v1.14.0 Spilo image: ghcr.io/zalando/spilo-17:4.0-p2

ihor-kokhan-onereach avatar Jun 13 '25 09:06 ihor-kokhan-onereach