aws-serverless-data-lake-framework icon indicating copy to clipboard operation
aws-serverless-data-lake-framework copied to clipboard

correct and clean manifests and cloudfront examples

Open mariandumitrascu-p opened this issue 2 years ago • 0 comments

#69

Correct and clean manifests and cloudfront examples

This pull request addresses some missing steps and correction in two examples of SDLF:

  • Deequ and EMR steps using Step Functions (3)
  • Manifest based processing (6)

Here is an aggregated summary of changes:

  • all changes are contained in ./sdlf-utils/pipeline-examples/cloudfront and sdlf-utils/pipeline-examples/manifests folders.
  • included steps for deploying the new pipelines, or datasets from local env, if a pipeline does not exists for them
  • added requirements for running deployments from a local development machine
  • replaced handling of a transformation through an extra branch (emr), since this would be harder to maintain in a multi deployment environment (dev, test prod). Replaced with creating its own stageB repo
  • changes to maintain initial naming convention for transformations. emr is breaking the convention (example the convention is light_transform_<short-desc>.py. Cloudfront was using different one.
  • added all missing steps in the correct order
  • segregated submitting data, from deploying the glue job, or emr scripts in the last step. Users should be able to submit data multiple times without re-deploying anything.

This changes were extensibly tested by me and a team 3 developers from PREDICTif Solutions.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mariandumitrascu-p avatar Jul 14 '22 19:07 mariandumitrascu-p