amazon-msk-java-app-cdk icon indicating copy to clipboard operation
amazon-msk-java-app-cdk copied to clipboard

This project provides and example of end to end data processing application created using the combination of Amazon Managed Streaming for Apache Kafka (Amazon MSK), AWS Fargate, AWS Lambda and Amazon...

Building Apache Kafka data processing Java application using AWS CDK

Table of contents

  1. Introduction
  2. Architecture
  3. Project structure
  4. Prerequisites
  5. Tools and services
  6. Usage
  7. Clean up

Introduction

This project provides an example of Apache Kafka data processing application. The build and deployment of the application if fully automated using AWS CDK.

Project consists of three main parts:

  • AWS infrastructure and deployment definition - AWS CDK scripts written in Typescript
  • AWS Lambda function - sends messages to Apache Kafka topic using KafkaJS library. It is implemented in Typescript.
  • Consumer application - Spring Boot Java application containing main business logic of the data processing pipeline. It consumes messages from Apache Kafka topic, performs simple validation and processing and stores results in Amazon DynamoDB table.

Provided example show cases two ways of packaging and deploying business logic using high-level AWS CDK constructs. One way using Dockerfile and AWS CDK ECS ContainerImage to deploy Java application to AWS Fargate and the other way is using AWS CDK NodejsFunction construct to deploy Typescript AWS Lambda code

Architecture

architecture

Triggering the TransactionHandler Lambda function publishes messages to an Apache Kafka topic. The application is packaged in a container and deployed to ECS Fargate, consumes messages from the Kafka topic, processes them, and stores the results in an Amazon DynamoDB table. The KafkaTopicHandler Lambda function is called once during deployment to create Kafka topic. Both the Lambda function and the consumer application publish logs to Amazon CloudWatch.

Project structure

  • amazon-msk-java-app-cdk/lib - directory containing all AWS CDK stacks
  • amazon-msk-java-app-cdk/bin - directory containing AWS CDK app definition
  • amazon-msk-java-app-cdk/lambda - directory containing code of TransactionHandler AWS Lambda function as well as code of Custom Resource handler responsible for creating Kafka topic during deployment
  • consumer - directory containing code of Kafka consumer Spring Boot Java application
  • consumer/docker/Dockerfile - definition of Docker image used for AWS Fargate container deployment
  • doc - directory containing architecture diagrams
  • scripts - directory containing deployment scripts

Prerequisites

Tools and services

  • AWS CDK – The AWS Cloud Development Kit (AWS CDK) is a software development framework for defining your cloud infrastructure and resources by using programming languages such as TypeScript, JavaScript, Python, Java, and C#/.Net.
  • Amazon MSK - Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that makes it easy for you to build and run applications that use Apache Kafka to process streaming data.
  • AWS Fargate – AWS Fargate is a serverless compute engine for containers. Fargate removes the need to provision and manage servers, and lets you focus on developing your applications.
  • AWS Lambda - AWS Lambda is a serverless compute service that lets you run code without provisioning or managing servers, creating workload-aware cluster scaling logic, maintaining event integrations, or managing runtimes.
  • Amazon DynamoDB - Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale.

Usage

  1. Start the Docker daemon on your local system. The AWS CDK uses Docker to build the image that is used in the AWS Fargate task. You must run Docker before you proceed to the next step.
  2. export AWS_PROFILE=<REPLACE WITH YOUR AWS PROFILE NAME> or alternatively follow instructions in the AWS CDK documentation
  3. (First-time AWS CDK users only) Follow instructions on AWS CDK documentation page to bootstrap AWS CDK
  4. cd scripts
  5. Run deploy.sh script
  6. Trigger lambda function to send message to Kafka queue. You can use below command line or alternatively you can trigger lambda function from AWS console. Fell free to change values of accountId and value fields in the payload JSON and send multiple messages with different payloads to experiment with the application.
    aws lambda invoke --cli-binary-format raw-in-base64-out --function-name TransactionHandler --log-type Tail --payload '{ "accountId": "account_123", "value": 456}' /dev/stdout --query 'LogResult' --output text | base64 -d
    
  7. To view results in DynamoDB table you can run below command. Alternatively you can navigate to Amazon DynamoDB AWS console and select Accounts table.
    aws dynamodb scan --table-name Accounts --query "Items[*].[id.S,Balance.N]" --output text
    
  8. You can also view CloudWatch logs in AWS console or by running aws logs tail command with specified CloudWatch Logs group

Clean up

Follow AWS CDK instructions to remove AWS CDK stacks from your account. You can also use scripts/destroy.sh. Be sure to also manually remove Amazon DynamoDB table, clean up Amazon CloudWatch logs and remove Amazon ECR images to avoid incurring additional AWS infrastructure costs.