source-controller
source-controller copied to clipboard
Source controller listing objects from S3 bucket failed probably due to larger size of the bucket
We are running flux version 0.28.5. We are using s3 bucket as source where the size of the s3 bucket is around 2TB.
The reconciliation always fails with the below error
flux reconcile source bucket s3-bucket-name --verbose
► annotating Bucket s3-bucket-name in flux-system namespace
✔ Bucket annotated
◎ waiting for Bucket reconciliation
✗ Bucket reconciliation failed: 'indexation of objects from bucket 's3-bucket-name' failed: listing objects from bucket 's3-bucket-name' failed: Get "https://s3.dualstack.us-east-1.amazonaws.com/s3-bucket-name/?continuation-token=1lTiNsKzHoVAVOu0PmalPgNkJEFDnybzDBu8XuqkkoZlCP7DtzXiQm%!!(MISSING)B(MISSING)Hea2BxSsoEprb4N3Wm%!!(MISSING)F(MISSING)3EYVLJ18P%!!(MISSING)F(MISSING)KRR0XAJe6kkXPw%!!(MISSING)D(MISSING)%!!(MISSING)D(MISSING)&delimiter=&encoding-type=url&fetch-owner=true&list-type=2&prefix=": context deadline exceeded'
I have tried add the ignore files as shown below but it did not work. I have also tried increasing the timeout upto 10m for testing purpose
apiVersion: source.toolkit.fluxcd.io/v1beta2
kind: Bucket
metadata:
name: s3-bucket-name
namespace: flux-system
spec:
interval: 30s
provider: aws
bucketName: s3-bucket-name
endpoint: s3.amazonaws.com
region: us-east-1
timeout: 300s
ignore: |
# exclude all
/*
# include flux dir
!/flux
# exclude file extensions from deploy dir
/flux/**/*.md
/flux/**/*.txt
I am not sure if the ignore options is included in filter here - https://github.com/fluxcd/source-controller/blob/main/pkg/minio/minio.go#L112
When I test with different s3 bucket with much smaller in size, the reconciliation works fine. Is there a way to get this working ?
I guess you don't have 2TB of Kubernetes YAMLs in there? I would create a dedicated bucket for Flux and have a Lambda function that syncs the YAML files from the 2TB bucket to the Flux one.
I have tried add the ignore files as shown below but it did not work.
To ignore files, we need to fetch the all file paths from the bucket, if you have a billion files in there, then it takes time, hours maybe.
As Stefan said, a Bucket is more like an enriched key/value storage, and we thus have to iterate over every key to see if it's a match and we can't e.g. "skip directories". The only trick left in this area that might help in your case, is if we would support defining a "prefix" to which files must match. This makes the filtering a server-side operation, and would decrease the number of iterations we have to do.
I guess you don't have 2TB of Kubernetes YAMLs in there? I would create a dedicated bucket for Flux and have a Lambda function that syncs the YAML files from the 2TB bucket to the Flux one.
I have tried add the ignore files as shown below but it did not work.
To ignore files, we need to fetch the all file paths from the bucket, if you have a billion files in there, then it takes time, hours maybe.
@stefanprodan : Yes you are right. The bucket contains lot of other artifacts and not only Yaml files. As suggested by you we would go with a dedicated bucket for flux.
@hiddeco : If the prefix option is supported in future then we might start using it. I was thinking Ignore option would solve this problem and I was wrong. Thanks for the explanation.