aws-serverless-data-lake-framework
aws-serverless-data-lake-framework copied to clipboard
correct and clean manifests and cloudfront examples
#69
Correct and clean manifests and cloudfront examples
This pull request addresses some missing steps and correction in two examples of SDLF:
- Deequ and EMR steps using Step Functions (3)
- Manifest based processing (6)
Here is an aggregated summary of changes:
- all changes are contained in
./sdlf-utils/pipeline-examples/cloudfront
andsdlf-utils/pipeline-examples/manifests
folders. - included steps for deploying the new pipelines, or datasets from local env, if a pipeline does not exists for them
- added requirements for running deployments from a local development machine
- replaced handling of a transformation through an extra branch (emr), since this would be harder to maintain in a multi deployment environment (dev, test prod). Replaced with creating its own stageB repo
- changes to maintain initial naming convention for transformations. emr is breaking the convention (example the convention is
light_transform_<short-desc>.py
. Cloudfront was using different one. - added all missing steps in the correct order
- segregated submitting data, from deploying the glue job, or emr scripts in the last step. Users should be able to submit data multiple times without re-deploying anything.
This changes were extensibly tested by me and a team 3 developers from PREDICTif Solutions.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.