elementary
elementary copied to clipboard
[ELE-47] AWS Glue Integration
Requesting integration with Amazon S3 as a data lake
- Data lake is built upon Amazon S3
- Most of Transformations/ETL are done in AWS GLUE with Spark and there is a dbt-glue-adapter which supports running dbt against Spark
- Some of the ETL jobs are orchestrated with Airflow
- Running the queries on GLUE Data Catalog with Amazon Athena
Want to set up data observability on top of input and output datasets.
Hi @rajkstats! Thanks for opening the issue! I'm not familiar with the dbt-glue-adapter, so it's hard to assess how many changes such integration will require. We recently decided (do to demand from the community) to add a Databricks integration, and decided to approach it gradually - Step 1 - add support for uploading dbt artifacts and run results (in the dbt package). Step 2 - add support in the CLI for Slack alerts and UI generation. Step 3 - add support for data anomaly detection test (the most complex and platform-specific part of the code right now).
Here is my PR for step 1 for Databricks, as you can see it actually required pretty minor changes. If you want to give a shot with AWS Glue, I would be happy to support you!
Thanks @Maayan-s for sharing the approach, I will give it a shot, let you know if I would need any support. Thanks.
@rajkstats did you do any progress on this?
Hi @bruno-ribeirodasilva, I assume that this issue can be re-assigned. Are you interested in giving it a shot?
@bruno-ribeirodasilva I wasn't able to pick this up, but have plans to pick it up. You feel free to give it a shot as @Maayan-s suggested
@Maayan-s did this progress? Do we have a way to use elementary with dbt-glue adapter?