serverless-dynamodb-scanner icon indicating copy to clipboard operation
serverless-dynamodb-scanner copied to clipboard

A Serverless project to help you operate on every existing item in a DynamoDB table

Serverless DynamoDB Scanner

This is a Serverless application that scans a given DynamoDB table and inserts every item into a Kinesis Stream. You can then process the Kinesis stream, allowing you to perform an operation on all existing items in a DynamoDB table.

It was inspired by a tweet from Eric Hammond:

Usage:

This project uses the Serverless Framework to deploy a Lambda function and associated AWS resources.

To use it, follow these steps:

  1. Install the Framework and create your service:

    # Make sure you have the Serverless Framework installed
    $ npm install -g serverless
    
    $ sls create --template-url https://github.com/alexdebrie/serverless-dynamodb-scanner --path serverless-dynamodb-scanner
    
    $ cd serverless-dynamodb-scanner
    
  2. Update the configuration in serverless.yml.

    Add the ARN of the DynamoDB table you want to scan and the ARN of the Kinesis stream where you want the config added:

    # serverless.yml
    
    custom:
      dynamodbTableArn: 'arn:aws:dynamodb:us-east-1:123456789012:table/my_table'
      kinesisStreamArn: 'arn:aws:kinesis:us-east-1:123456789012:stream/my-stream'
    
    ...
    
  3. Deploy your service:

    $ sls deploy
    
  4. When you're ready, kick off your scan by invoking the function:

    $ sls invoke -f scanner
    

How does it work?

The basic workflow is as follows:

dynamodb scanner

A diagram that's missing a few steps ¯\_(ツ)_/¯

  1. Inside the Lambda function, check AWS SSM for a LastEvaluatedKey parameter that would be sent with our Scan call.

    If no parameter exist in SSM, we're just starting the scan.

  2. Make a Scan call to our DynamoDB table.

  3. Insert the items returned from our Scan into our Kinesis Stream via a PutRecords call.

  4. If the Scan call did not return a LastEvaluatedKey, our Scan is done! We can exit the function.

  5. If the Scan did return a LastEvaluatedKey, store the value in SSM.

  6. Do a time check -- if our function has less than 15 seconds of execution time left, we'll invoke another instance of our function and exit the loop for this one. Our next function will pick up where our scan left off by using the LastEvaluatedKey in SSM.