biolink-api
biolink-api copied to clipboard
Ensure uptime monitoring is set for all sub-services used in production
Sub-services listed here: https://github.com/biolink/biolink-api/blob/master/conf/config.yaml
However, only a subset currently used for production purposes:
- http://golr.berkeleybop.org - GO function
- https://solr.monarchinitiative.org/solr/golr - homology
- scigraph-dev - ID mapping (see #108)
We should have either/or
- periodic execution of behave tests, with a warm body to check if something goes wrong
- uptime robot checks on all 3 sub-services, as well as api.mi itself
I'd actually expand to all of:
- uptime robot checks on all 3 sub-services, as well as api.mi itself (setting this up now)
- periodic execution of behave tests, with a warm body to check if something goes wrong (periodically run in Jenkins with email on failure, or have uptimerobot look at the results)
- nagios and greylog scanning for likely problems (harder and we'd need some bandwidth to do it, but important for catching some problem classes before outages occur)
I think number 1 for this ticket, at least, and then planning items for the other two.
tagging @jmcmurry for coordination of warm bodies
Okay, a first pass is done. Currently:
- all warnings are fired to chris or chris and myself (in the case of BBOP GOlr dev--enjoy that extra chatter, Chris ;)
- A beginning status page can be found here: https://stats.uptimerobot.com/Z44RnHOYG
- added at 15min frequency (the second and third already being used for AGR):
- http://golr.berkeleybop.org/solr/select?q=:&rows=0
- https://api.monarchinitiative.org/api/bioentity/gene/NCBIGene:3630/homologs/?homology_type=O&fetch_objects=false ("ENSMUSG00000000215")
- https://api.monarchinitiative.org/api/bioentityset/slimmer/function?&slim=GO:0003824&slim=GO:0004872&slim=GO:0005102&slim=GO:0005215&slim=GO:0005198&slim=GO:0008092&slim=GO:0003677&slim=GO:0003723&slim=GO:0001071&slim=GO:0036094&slim=GO:0046872&slim=GO:0030246&slim=GO:0003674&slim=GO:0008283&slim=GO:0071840&slim=GO:0051179&slim=GO:0032502&slim=GO:0000003&slim=GO:0002376&slim=GO:0050877&slim=GO:0050896&slim=GO:0023052&slim=GO:0010467&slim=GO:0019538&slim=GO:0006259&slim=GO:0044281&slim=GO:0050789&slim=GO:0042592&slim=GO:0007610&slim=GO:0008150&slim=GO:0005576&slim=GO:0005737&slim=GO:0005856&slim=GO:0005739&slim=GO:0005634&slim=GO:0005694&slim=GO:0016020&slim=GO:0031982&slim=GO:0071944&slim=GO:0030054&slim=GO:0042995&slim=GO:0032991&slim=GO:0045202&slim=GO:0005575&subject=SGD:S000002976 ("provided_by")
- https://solr.monarchinitiative.org/solr/golr/select/?defType=edismax&qt=standard&indent=on&wt=json&rows=25&start=0&fl=,score&facet=true&facet.mincount=1&facet.sort=count&json.nl=arrarr&facet.limit=25&facet.method=enum&fq=object_category:%22gene%22&fq=relation_closure:%22RO:HOM0000001%22&fq=subject_closure:%22NCBIGene:23543%22&q=:* ("subject_gene_label":"RBFOX2")
- https://scigraph-data-dev.monarchinitiative.org/scigraph/graph/neighbors?id=GO%3A0022008&depth=1&blankNodes=false&relationshipType=subClassOf&direction=BOTH&entail=false ("lbl":"cell differentiation")
TODO:
- a list of people to be notified when the various services are down
- additional tests to be sent to me
- for status.monarchinitiative.org, add CNAME to stats.uptimerobot.com.