checkup icon indicating copy to clipboard operation
checkup copied to clipboard

Running checks from multiple locations

Open jeremych1000 opened this issue 5 years ago • 1 comments

Hello! Stumbled upon this tool and it looks perfect for an uptime monitoring system that I'm building out. I have two quick questions.

  1. Running from multiple regions
  • I am planning to use checkup to monitor uptime of a service
  • I am planning to deploy checkup using AWS Lambdas in multiple regions to improve redundancy of the system
  • How does checkup recommend I go about this? Would I deploy multiple copies of the frontend, checkup, and db? Or use one main frontend + db in one region, and multiple checkup lambdas in other regions, all reporting back to the single main region db?
  • If one region, surely the response times will be vastly different depending on the region I'm pinging from - how does uptime differentiate between these?
  1. HA Checkup
  • There is a requirement to make this uptime monitoring solution a mission critical service, so high availability of the service needs to be built in
  • What happens if the storage mechanism is unavailable e.g. the postgres db is unavailable due to a cloud outage?
  • I don't see an option to have multiple storage backends defined, is it possible to do this? For example, store everything into postgres with a backup in S3, so in case postgres goes down we still have uptime metrics in S3?

Our high level planned architecture:

  • AWS Lambda for checkup checks
  • AWS Fargate for checkup frontend, fronted by a load balancer
  • AWS Aurora Postgres for checkup's DB

Thanks!

jeremych1000 avatar Oct 27 '20 18:10 jeremych1000

Update, checkup seems to be reading from mysql/postgres one by one, so am I correct in saying that the performance between reading from db and reading from S3 is the same?

https://github.com/sourcegraph/checkup/blob/master/storage/postgres/postgres.go#L94

jeremych1000 avatar Oct 28 '20 11:10 jeremych1000