nginx-s3-gateway icon indicating copy to clipboard operation
nginx-s3-gateway copied to clipboard

No caching when using Docker image.

Open kristoferlundgren opened this issue 2 years ago • 9 comments

Describe the bug Using the latest Docker image, no data is being cached.

To Reproduce Steps to reproduce the behavior:

  1. Start container docker run --rm -ti -p 80:80 -e S3_SERVER=storage.googleapis.com -e S3_ACCESS_KEY_ID="<key>" -e S3_SECRET_KEY="<secret>" --env-file s3.env nginxinc/nginx-s3-gateway:latest-20221026

s3.env file:

S3_BUCKET_NAME=<bucket-name>
S3_SERVER_PORT=443
S3_SERVER_PROTO=https
S3_REGION=us-east-1
S3_STYLE=virtual
S3_DEBUG=false
AWS_SIGS_VERSION=4
ALLOW_DIRECTORY_LIST=true
PROVIDE_INDEX_PAGE=false
APPEND_SLASH_FOR_POSSIBLE_DIRECTORY=false
PROXY_CACHE_VALID_OK=1h
PROXY_CACHE_VALID_NOTFOUND=1m
PROXY_CACHE_VALID_FORBIDDEN=30s
  1. Pull multiple files from the S3 gateway at http://localhost

I can successfully browse the S3 bucket directory structure and download objects without any issue. Although, when downloading the same object multiple times I cannot see any performance increase from a cache hit.

  1. Exec into container docker exec -ti <container> bash Run the following command: ls -la /var/cache/nginx/s3_proxy/

The cache directory is empty. I also looked for looked for any disk usage increase with the command du -sh /* but no cached data is being stored in the container.

Expected behavior According to the documentation, data should be cached when accessed multiple times and not reloaded from the remote S3 bucket at each access.

Your environment

  • Version of the container used (if downloaded from Docker Hub or Github) Docker image: nginxinc/nginx-s3-gateway:latest-20221026
  • S3 backend implementation you are using (AWS, Ceph, NetApp StorageGrid, etc) Google Cloud Storage
  • How you are deploying Docker/Stand-alone, etc Docker on MacOS using Rancher Desktop.
  • Authentication method (IAM, IAM with Fargate, IAM with K8S, AWS Credentials, etc) S3 authentication using Google Service Account with HMAC keys.

kristoferlundgren avatar Nov 02 '22 12:11 kristoferlundgren

Thank you for writing up this issue in such detail.

So far, I've been unable to reproduce this bug using AWS. In my configuration, I've put a text file on my S3 bucket and ran curl against it in a loop.

I saw that the cache files were correctly populated in the /var/cache/nginx/s3_proxy directory. I also monitored the instance for outbound connections via netstat and I only saw outbound connections every minute or so.

On my container, the contents of the cache directory look like:

root@88822b1c11cd:/var/cache/nginx/s3_proxy# find /var/cache/nginx/s3_proxy/
/var/cache/nginx/s3_proxy/
/var/cache/nginx/s3_proxy/1
/var/cache/nginx/s3_proxy/1/93
/var/cache/nginx/s3_proxy/1/93/b620bfa0e09b3cc11521660acb6e2931

I'll go and try to see if I can reproduce the issue on Google Cloud Storage.

dekobon avatar Nov 03 '22 21:11 dekobon

I just ran the same configuration against Google Cloud Storage and I was able to reproduce the behavior.

dekobon avatar Nov 03 '22 22:11 dekobon

I found the source of the issue. Google Cloud Storage diverges from the AWS S3 behavior by setting Cache-Control: private, max-age=0 by default for all objects. You need to edit the metadata for your object on Google Cloud Storage and change the value of Cache-Control to public in order to enable caching with the gateway. See the Cloud Storage Documentation for more information.

There may be a way to configure NGINX to ignore the header sent by Google Cloud Storage by using the proxy_ignore_headers directive to ignore the Cache-Control header.

dekobon avatar Nov 03 '22 22:11 dekobon

Many thanks for tracking down the root cause of this issue.

As you (@dekobon ) suggested, I added proxy_ignore_headers Cache-Control; to the http {} part of /etc/nginx/nginx.conf, ran nginx -s reload inside the container. And voilà, it works! Files are now cached, as expected.

I now have some choices.

  1. Mount my own /etc/nginx/nginx.conf into the container.
  2. Build and run a modified container image with this tiny modification.
  3. Ask this project to add the proxy_ignore_headers Cache-Control; as part of the config. Preferably configurable with an environment variable.

I would like to first ask for no.3 . What are your thoughts?

Again, thanks!

kristoferlundgren avatar Nov 04 '22 10:11 kristoferlundgren

I think asking for number three is reasonable. We may need a generalized way to accomplish this because we also need to solve for #65 .

dekobon avatar Nov 04 '22 16:11 dekobon

I've made some updates to the container so that you can now layer in additional NGINX configuration. See the documentation.

Also, I added a feature that allows you to strip out headers from the client response. For Google Cloud Storage you will want to do:

HEADER_PREFIXES_TO_STRIP=x-goog-;x-guploader-uploadid

Please let me know if this solution works for you. If it does, I'll mark this issue as closed.

dekobon avatar Nov 04 '22 19:11 dekobon

  1. Trying the new feature by added the Cache-Control header: HEADER_PREFIXES_TO_STRIP="x-goog-;x-guploader-uploadid;Cache-Control" Resulted in the error: HEADER_PREFIXES_TO_STRIP must not contain uppercase characters (as documented)

  2. Second try (lowercase Cache-Control): HEADER_PREFIXES_TO_STRIP="x-goog-;x-guploader-uploadid;cache-control" Downloaded some files and then checked the cache directory. -Empty, i.e. Cache is still disabled.

  3. Third try: (stripping x-goog headers and mounting nginx http config file) docker run --rm -ti -p 80:80 -e S3_SERVER=storage.googleapis.com -e S3_ACCESS_KEY_ID="<key>" -e S3_SECRET_KEY="<secret>" -e HEADER_PREFIXES_TO_STRIP="x-goog-;x-guploader-uploadid" --env-file s3.env -v $(pwd)/cache.conf:/etc/nginx/conf.d/cache.conf nginxinc/nginx-s3-gateway:latest Where the $(pwd)/cache.conf file contains: proxy_ignore_headers Cache-Control; Downloaded some files and then checked the cache directory. Cache directory has content. I.e. Cache is working! :)

I would have preferred an environment variable solution, but this config works as well. Many thanks for the assessment and quick remediation of this issue. And also reporting and fixing #65.

Before closing this issue I believe the need for proxy_ignore_headers Cache-Control; ought to be documented to aid usage when s3 backends (ex. Google Cloud Storage) emit caching preferences.

kristoferlundgren avatar Nov 05 '22 23:11 kristoferlundgren

I agree it should be documented. Also, we may want to add an environment variable that allows for ignoring cache control, but I wanted to get the extensibility part done ASAP because we've gotten a lot of requests for similar things and the number of environment variables is starting to add up.

I'll leave this issue open until we can add a setting.

dekobon avatar Nov 06 '22 04:11 dekobon

I made a stupid mistake of exec into the wrong running container with the same name so i didnt find any cache check if this also might be the reason

akashgreninja avatar Feb 28 '24 05:02 akashgreninja