biolink-api icon indicating copy to clipboard operation
biolink-api copied to clipboard

Ensure uptime monitoring is set for all sub-services used in production

Open cmungall opened this issue 8 years ago • 4 comments

Sub-services listed here: https://github.com/biolink/biolink-api/blob/master/conf/config.yaml

However, only a subset currently used for production purposes:

  • http://golr.berkeleybop.org - GO function
  • https://solr.monarchinitiative.org/solr/golr - homology
  • scigraph-dev - ID mapping (see #108)

We should have either/or

  • periodic execution of behave tests, with a warm body to check if something goes wrong
  • uptime robot checks on all 3 sub-services, as well as api.mi itself

cmungall avatar Oct 20 '17 20:10 cmungall

I'd actually expand to all of:

  • uptime robot checks on all 3 sub-services, as well as api.mi itself (setting this up now)
  • periodic execution of behave tests, with a warm body to check if something goes wrong (periodically run in Jenkins with email on failure, or have uptimerobot look at the results)
  • nagios and greylog scanning for likely problems (harder and we'd need some bandwidth to do it, but important for catching some problem classes before outages occur)

kltm avatar Oct 20 '17 23:10 kltm

I think number 1 for this ticket, at least, and then planning items for the other two.

kltm avatar Oct 20 '17 23:10 kltm

tagging @jmcmurry for coordination of warm bodies

cmungall avatar Oct 20 '17 23:10 cmungall

Okay, a first pass is done. Currently:

  • all warnings are fired to chris or chris and myself (in the case of BBOP GOlr dev--enjoy that extra chatter, Chris ;)
  • A beginning status page can be found here: https://stats.uptimerobot.com/Z44RnHOYG
  • added at 15min frequency (the second and third already being used for AGR):
    • http://golr.berkeleybop.org/solr/select?q=:&rows=0
    • https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:3630/homologs/?homology_type=O&fetch_objects=false ("ENSMUSG00000000215")
    • https://api.monarchinitiative.org/api/bioentityset/slimmer/function?&slim=GO:0003824&slim=GO:0004872&slim=GO:0005102&slim=GO:0005215&slim=GO:0005198&slim=GO:0008092&slim=GO:0003677&slim=GO:0003723&slim=GO:0001071&slim=GO:0036094&slim=GO:0046872&slim=GO:0030246&slim=GO:0003674&slim=GO:0008283&slim=GO:0071840&slim=GO:0051179&slim=GO:0032502&slim=GO:0000003&slim=GO:0002376&slim=GO:0050877&slim=GO:0050896&slim=GO:0023052&slim=GO:0010467&slim=GO:0019538&slim=GO:0006259&slim=GO:0044281&slim=GO:0050789&slim=GO:0042592&slim=GO:0007610&slim=GO:0008150&slim=GO:0005576&slim=GO:0005737&slim=GO:0005856&slim=GO:0005739&slim=GO:0005634&slim=GO:0005694&slim=GO:0016020&slim=GO:0031982&slim=GO:0071944&slim=GO:0030054&slim=GO:0042995&slim=GO:0032991&slim=GO:0045202&slim=GO:0005575&subject=SGD:S000002976 ("provided_by")
    • https://solr.monarchinitiative.org/solr/golr/select/?defType=edismax&qt=standard&indent=on&wt=json&rows=25&start=0&fl=,score&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&facet.method=enum&fq=object_category:%22gene%22&fq=relation_closure:%22RO:HOM0000001%22&fq=subject_closure:%22NCBIGene:23543%22&q=:* ("subject_gene_label":"RBFOX2")
    • https://scigraph-data-dev.monarchinitiative.org/scigraph/graph/neighbors?id=GO%3A0022008&depth=1&blankNodes=false&relationshipType=subClassOf&direction=BOTH&entail=false ("lbl":"cell differentiation")

TODO:

  • a list of people to be notified when the various services are down
  • additional tests to be sent to me
  • for status.monarchinitiative.org, add CNAME to stats.uptimerobot.com.

kltm avatar Oct 21 '17 00:10 kltm