aws-cdk-examples icon indicating copy to clipboard operation
aws-cdk-examples copied to clipboard

CDK Example for AWS Glue Workflow including Glue Jobs, Triggers, Crawlers, securtiy group and Database

Open PuneetBabbar opened this issue 3 years ago • 3 comments

:rocket: Feature Request

General Information

  • [X] :wave: I want to implement this feature request
  • [ ] :warning: This feature might incur a breaking change

Description

This example will be like another example in the REPO. This new feature request will add another example and will cover everything around AWS Glue, containing AWS Glue workflow, Glue Jobs, Triggers, Crawlers, security group and Database.

This will enhance and add new feature. As currently there is no code example available online and also no AWS documentation with code example to demonstrate on how can can use CDK to setup an ecosystem around Glue.

Proposed Solution

The idea is to build an example that will setup an ETL pipeline using tool set available in Glue ecosystem, and implement or construct it via complete CDK code.

I was thinking to open source data and build an example for a Data ETL job, something similar to blog like https://aws.amazon.com/blogs/devops/provision-codepipeline-glue-workflows/ or something similar. But the idea is to build an ETL pipeline via workflow. And have steps constructed via CDK

  1. Glue Crawler to catalog S3 data.
  2. Glue Jobs (Spark) to process and transform the catalog data
  3. Glue Trigger for calling the above Crawler and Jobs
  4. Glue Workflow to orchestrate the above components.

I don't have the diagram for the workflow, but can create and will be able to add to the example for better understanding.

Environment

  • CDK Version: 1.102.0:
  • Example : GLUE Workflow
  • Example Version: N/A
  • OS: Ubuntu
  • language: All

Other information

PuneetBabbar avatar May 06 '21 16:05 PuneetBabbar