charts icon indicating copy to clipboard operation
charts copied to clipboard

feat: add s3-sync sidecar

Open chirichidi opened this issue 1 year ago • 4 comments

What issues does your PR fix?

  • fixes #249

What does your PR do?

Overview

This Pull Request introduces a new feature, s3Sync, designed to enhance our application's ability to synchronize data with AWS S3. This addition aims to provide a more robust and flexible solution for managing cloud storage synchronization tasks.

Details

  • New Feature: Implemented the s3Sync functionality, leveraging the official aws-cli library and straightforward logic to establish core functionalities such as stability and the automatic detection of changes.
  • Ensured that the new s3Sync feature is fully compatible with our existing infrastructure and does not introduce any breaking changes or dependencies.

No Changes to Existing gitSync Functionality

  • It's crucial to note that while developing the s3Sync feature, special care was taken not to modify or affect the existing gitSync functionality. Our commitment was to add value without disrupting current operations or workflows.
  • Comprehensive testing has been conducted to confirm that gitSync remains unaffected and operates as expected.

Testing and Validation

  • Conducted basic tests such as ct lint and helm template, and performed operational tests on an actual Kubernetes cluster, especially gitSync, continue to operate without any issues.
image 스크린샷 2024-02-17 오전 1 27 10

Conclusion

This enhancement is a step forward in our ongoing efforts to provide a seamless and powerful toolset for our users. By introducing s3Sync, we are expanding our capabilities while ensuring the integrity and performance of our existing features remain intact.

I look forward to your feedback and any discussions regarding this PR. Thank you for considering these enhancements.

Checklist

For all Pull Requests

For releasing ONLY

chirichidi avatar Feb 16 '24 16:02 chirichidi

I am curious to know how syncing DAGs from S3 work? do we need to create a kubectl secret with our AWS key and secret key, and how often will it poll for new DAG files/ folders?

rsotogar avatar Mar 20 '24 10:03 rsotogar

This would be useful to have in the Helm chart. Git sync is sometime not the best option.

wikitops avatar Apr 09 '24 12:04 wikitops

just another bump on this PR. This seems like the best option for AirFlow deployments in EKS.

And just to clarify about AWS credentials, in general we would be using IAM roles rather than user credentials, so there should be no need for additional k8s secrets, or anything like that.

pedorro avatar Apr 29 '24 17:04 pedorro

@chirichidi thanks for the very interesting PR, I would love to get "s3-sync" as a concept into the chart (as it will help users migrate from MWAA).

The main thing we need to finalize is the "reconciliation loop", everything else is secondary and can be updated later.

If I understand your PR correctly, you have done the following:

  • You have implemented a "sidecar" pattern similar to our gitsync sidecar
  • An init-container which runs the following command (to populate the dags folder as the pod starts)
    • aws s3 cp --recursive s3://<BUCKET>/<PATH> ./path/to/dags
  • A sidecar container which runs the following command on loop (to keep the dags folder up to date):
    • aws s3 sync --delete s3://<BUCKET>/<PATH> ./path/to/dags

My main concerns are:

  1. What happens when a sync is halfway, but airflow starts refreshing the DAGs (so we have some old and some new)?
    • This is avoided in git-sync by using symbolic link switching.
    • It's likley rare for something seriously bad to happen when this occurs, so it might not matter.
  2. Wouldn't it make more sense to also use aws s3 sync for the init-container?
    • Because init-containers can sometimes run again (when the pod restarts), so we can save a bit of time by not re-downloading everything.
  3. I wonder if we might want to use some or all of the following aws sync parameters:
    • --quiet
    • --only-show-errors

Are there any other things I have missed?

PS: if/when we merge this, I will update the values/docs in your PR to match the style of the chart.

thesuperzapper avatar Apr 30 '24 03:04 thesuperzapper