serverless-patterns icon indicating copy to clipboard operation
serverless-patterns copied to clipboard

[New repo]: InsuranceLake

Open coryvisi opened this issue 11 months ago • 0 comments

Description

InsuranceLake allows customers to Collect, Cleanse, and Consume their Insurance Data with 7 simple AWS services.

It is a data lake and pipeline reference architecture built to process batch files by mapping source to target columns, transforming each column, and applying data quality rules. It is based on the Olympic Data Lake Pattern (Bronze, Silver, Gold) which we call Collect, Cleanse & curate, and Consume.

The most common type of batch file data sources are large delimited text files, Excel files, and fixed width files. InsuranceLake can be enhanced to accept change data capture, streaming, and document data sources.

Each incoming data source (e.g. a specific CSV file with commercial auto policies from broker abc) is intended to have a mapping, transform, data quality, and if desired, an entity match instruction file to be paired with it. These instruction files are not mandatory and InsuranceLake will create default ones if none are provided. The incoming data files are placed in the Collect layer, a workflow is then triggered to run the mapping, transform, data quality, and entity match processes and the results are stored in the Cleanse layer. Any data quality rules marked as quarantine will kick bad data out to quarantine tables. Finally a set of Apache Spark SQL and Amazon Athena SQL files can be run to populate the Consume layer.

language

English

runtime

Python

Level

200

Type

Application

Use case

Backend

Primary image

https://raw.githubusercontent.com/aws-samples/aws-insurancelake-etl/main/resources/insurancelake-highlevel-architecture.png

IaC framework

AWS CDK

AWS Serverless services used

  • [ ] Amazon API Gateway
  • [X] Amazon DynamoDB
  • [ ] Amazon EventBridge
  • [ ] AWS IoT
  • [X] AWS Lambda
  • [ ] Amazon Rekognition
  • [X] Amazon S3
  • [X] AWS Step Functions
  • [X] Amazon SNS
  • [ ] Amazon SQS
  • [ ] Amazon Transcribe
  • [ ] Amazon Translate

Description headline

Deploy and try out a serverless, Olympic pattern data lake with AWS InsuranceLake in 30 minutes

Repo URL

https://github.com/aws-samples/aws-insurancelake-etl

Additional resources

Supporting Github repository with InsuranceLake infrastructure InsuranceLake Quickstart Self-paced Workshop User Documentation

Author Name

Cory Visi

Author Image URL

https://avatars.githubusercontent.com/u/117751550?v=4

Author Bio

Cory is a Senior Solutions Architect at AWS helping insurance industry customers accelerate their cloud modernization journey.

Author Twitter handle

No response

Author LinkedIn URL

No response

leave

No response

coryvisi avatar Mar 27 '24 21:03 coryvisi