dynamodb-replicator icon indicating copy to clipboard operation
dynamodb-replicator copied to clipboard

incremental backup and incremental backfill generate different file names

Open yonahforst opened this issue 7 years ago • 2 comments

Hi there!

First off, great library. It's super useful and a much better/simpler option (for me) than the whole EMR/Datapipeline situation.

I have this simple lambda function that is subscribed to the tables I want to update: (the bucket, region, and prefix are set as env variables in the lambda function)

var replicator = require('dynamodb-replicator')
module.exports.streaming = (event, context, callback) => {
  return replicator.backup(event, callback)
}

Then I ran the backfill by importing dynamodb-replicator/s3-backfill and passing it a config object.

However, I noticed that when records get updated via the stream/lambda function, they are written to a different file from the one created by the backfill.

I see that the formula for generating filenames is slightly different.

\\backfilll
            var id = crypto.createHash('md5')
                .update(Dyno.serialize(key))
                .digest('hex');

\\backup
            var id = crypto.createHash('md5')
                .update(JSON.stringify(change.dynamodb.Keys))
                .digest('hex');

https://github.com/mapbox/dynamodb-replicator/blob/master/s3-backfill.js#L46-L48 https://github.com/mapbox/dynamodb-replicator/blob/master/index.js#L130-L132

Does this make any practical difference? Should the restore function work regardless?

yonahforst avatar Mar 01 '17 12:03 yonahforst

I've realized that Dyno.serialize in backfill just converts from js objects to DynamoDB JSON, which is what get from the stream in backup. Then I'm not sure why they generate different keys. maybe the order of the stringified keys?

yonahforst avatar Mar 06 '17 11:03 yonahforst

confirmed that sorting key obj before generating the id hash resolves this issue.

yonahforst avatar Mar 06 '17 12:03 yonahforst