serverless-patterns
serverless-patterns copied to clipboard
[New repo]: InsuranceLake
Description
InsuranceLake allows customers to Collect, Cleanse, and Consume their Insurance Data with 7 simple AWS services.
It is a data lake and pipeline reference architecture built to process batch files by mapping source to target columns, transforming each column, and applying data quality rules. It is based on the Olympic Data Lake Pattern (Bronze, Silver, Gold) which we call Collect, Cleanse & curate, and Consume.
The most common type of batch file data sources are large delimited text files, Excel files, and fixed width files. InsuranceLake can be enhanced to accept change data capture, streaming, and document data sources.
Each incoming data source (e.g. a specific CSV file with commercial auto policies from broker abc) is intended to have a mapping, transform, data quality, and if desired, an entity match instruction file to be paired with it. These instruction files are not mandatory and InsuranceLake will create default ones if none are provided. The incoming data files are placed in the Collect layer, a workflow is then triggered to run the mapping, transform, data quality, and entity match processes and the results are stored in the Cleanse layer. Any data quality rules marked as quarantine will kick bad data out to quarantine tables. Finally a set of Apache Spark SQL and Amazon Athena SQL files can be run to populate the Consume layer.
language
English
runtime
Python
Level
200
Type
Application
Use case
Backend
Primary image
https://raw.githubusercontent.com/aws-samples/aws-insurancelake-etl/main/resources/insurancelake-highlevel-architecture.png
IaC framework
AWS CDK
AWS Serverless services used
- [ ] Amazon API Gateway
- [X] Amazon DynamoDB
- [ ] Amazon EventBridge
- [ ] AWS IoT
- [X] AWS Lambda
- [ ] Amazon Rekognition
- [X] Amazon S3
- [X] AWS Step Functions
- [X] Amazon SNS
- [ ] Amazon SQS
- [ ] Amazon Transcribe
- [ ] Amazon Translate
Description headline
Deploy and try out a serverless, Olympic pattern data lake with AWS InsuranceLake in 30 minutes
Repo URL
https://github.com/aws-samples/aws-insurancelake-etl
Additional resources
Supporting Github repository with InsuranceLake infrastructure InsuranceLake Quickstart Self-paced Workshop User Documentation
Author Name
Cory Visi
Author Image URL
https://avatars.githubusercontent.com/u/117751550?v=4
Author Bio
Cory is a Senior Solutions Architect at AWS helping insurance industry customers accelerate their cloud modernization journey.
Author Twitter handle
No response
Author LinkedIn URL
No response
leave
No response