retrieving HTTP error 500 on each request using IAM service account
Bug Overview
Context
I was using a very old image of nginx-s3-gateway, ghcr.io/nginxinc/nginx-s3-gateway/nginx-oss-s3-gateway:latest-20241125, and it's being working fine for a while.
Yesterday I pull the lates image available:
- ghcr.io/nginx/nginx-s3-gateway/nginx-oss-s3-gateway:latest-20250616 or
- ghcr.io/nginx/nginx-s3-gateway/nginx-oss-s3-gateway:latest
No changes in my configuration, and the nginx is returning 500 on each request.
I tested manually with the aws-cli in the pod with the same role, and I'm able to retrieve files from the bucket. After I enabled the debug logs, I see the following line:
==> error.log <==
2025/06/17 09:33:42 [info] 330#330: *22 js: Cached credentials are expired or not present, requesting new ones
==> access.log <==
172.16.103.70 - - [17/Jun/2025:09:34:42 +0000] "GET /images/products/image.png HTTP/1.1" 500 170 "-" "curl/8.7.1"
==> error.log <==
2025/06/17 09:34:42 [info] 330#330: *22 js: Could not load EC2 task role credentials: {}
Note: My configuration is similar to the one present in the getting started document for using IAM Service Account.
Expected Behavior
expect 200 if the file exists, or 404 if the file is not found, not error 500
Steps to Reproduce the Bug
- Use latest image available in your deployment.
- Create the Service Account with the role to assume in annotation.
- Configure the deployment in order to use the Service Account.
- Request a file from the bucket
Environment Details
- AWS EKS Cluster 1.31
- aws-s3-gateway version
latest-20250616
Additional Context
No response
I have been debugging this process together with some friends, and it seems the environmental variables related to IAM for Service Accounts are not propagated, similar than you fixed in this commit: https://github.com/nginx/nginx-s3-gateway/commit/44099e7dd6d5bb13efcf9978b9dc88d80f49ecff
Adding
env AWS_ROLE_ARN;
env AWS_ROLE_SESSION_NAME;
env AWS_SIGS_VERSION;
env AWS_WEB_IDENTITY_TOKEN_FILE;
env AWS_STS_REGIONAL_ENDPOINTS;
env AWS_REGION;
to the nginx.conf, make it works. Is it possible that something has changed in the nginx that previously get the environment variables from the main process?
This seems to be related to https://github.com/nginx/nginx-s3-gateway/issues/410
Pod identity broke in 20250519. 20250512 is the last version that works for me. Between 20250512 and 20250519 nginx was upgraded from 1.27.2 to 1.27.5.
I believe this is indeed related to #410. I created #452 that should fix that issue. I was seeing the same error as reported here and with that env variable preserved everything works for me.