serverless-analytics icon indicating copy to clipboard operation
serverless-analytics copied to clipboard

To simplfy, maybe reduce dependency to Dynamo

Open sebs opened this issue 7 years ago • 1 comments

Just to start a discussion, I guess I want some feedback here as well.

Why not store the data in S3? As far as I can see, the counters are relative costly. Possibly the following would work:

  1. Store any hit to s3, either find a way to efficiently append to a s3 data struct or throw single hits into one folder. if you take the day, month, year into account, folders can make up for some of the lost features.
  2. Provide lambda magic to determine if there is data and do the calculations: day, month, year, per site. Remember: The data is write once, so as soon as this is done, its read only. Exception: actual month.
  • daily has to be generated each day
  • updating the actual months totals sum(days)
  • updating yearly totals sum(year)
  • maybe a dynamic solution for real time data, a listener for s3 writes will possibly do.

The obvious flaws are missing "Paging" of detail data and large datasets. However, I did not wanted to imply that dynamo is completely off the board, but maybe its a simplification to initially give the data to s3

sebs avatar Mar 18 '18 10:03 sebs

Reducing the fixed costs (dynamodb, kinesis) with a simplified architecture (apigw -> lambda -> s3) would indeed be useful. Athena could help with the aggregations/stats, but it would require some thought.

palmerabollo avatar Mar 06 '19 10:03 palmerabollo