rancher-alarms icon indicating copy to clipboard operation
rancher-alarms copied to clipboard

False positives

Open iDiogenes opened this issue 7 years ago • 11 comments

Hello,

I pointed this at my rancher 1.4.3 server and it said about half my stacks were in an UNHEALTHY state and fired off a bunch of emails. However, the rancher UI says everything is green.

Is 1.4 supported?

iDiogenes avatar Apr 11 '17 01:04 iDiogenes

I tried clean v1.4.3 and some sample apps from catalog and it seems to get state changes without any problem. Can you provide logs from container ? we can add some debug to see more later.

My sample log:

[INFO]   2017-4-11 5:48:22:643     start polling rancher-eventer/rancher-eventer
[INFO]   2017-4-11 5:48:22:644     start polling pxc/pxc
[INFO]   2017-4-11 5:48:22:644     start polling concrete5/cmsmysql
[INFO]   2017-4-11 5:48:22:645     start polling concrete5/concrete5app
[INFO]   2017-4-11 5:48:22:645     start polling dokuwiki2/dokuwiki-server
[INFO]   2017-4-11 6:4:9:301       service concrete5/cmsmysql active -> degraded
[INFO]   2017-4-11 6:4:24:325      service concrete5/cmsmysql degraded -> active
[INFO]   2017-4-11 6:5:24:460      service concrete5/concrete5app active -> upgraded
[INFO]   2017-4-11 6:5:39:492      service concrete5/concrete5app upgraded -> active
[INFO]   2017-4-11 6:5:54:526      service concrete5/concrete5app active -> degraded
[INFO]   2017-4-11 6:6:9:549       service concrete5/concrete5app degraded -> active
[INFO]   2017-4-11 6:6:23:287      stopping pxc due to rolling-back state
[INFO]   2017-4-11 6:6:23:287      stop polling pxc/pxc
[INFO]   2017-4-11 6:6:24:625      service pxc/pxc         upgrading -> degraded
[INFO]   2017-4-11 6:7:23:321      discovered new running service, creating monitor for: pxc/pxc
[INFO]   2017-4-11 6:7:23:322      new monitor up pxc/pxc:
  targets: "(HipchatTarget {\"notify\":\"true\"})"
  healthcheck: {
    "pollInterval": 15000,
    "healthyThreshold": 3,
    "unhealthyThreshold": 4
}

[INFO]   2017-4-11 6:7:23:322      start polling pxc/pxc
[INFO]   2017-4-11 6:7:38:364      service pxc/pxc         active -> degraded
[INFO]   2017-4-11 6:8:23:475      service pxc/pxc         became UNHEALTHY with threshold 4
[INFO]   2017-4-11 6:8:24:64       sent event to Hipchat service pxc in stack pxc became degraded (active) link: http://xxx.xxx.xxx.xxx:8080/env/1a5/apps/stacks/1e7/services/1s8/containers

VAdamec avatar Apr 11 '17 06:04 VAdamec

I am not sure if you initiated an issue, but it looks like your service pxc became UNHEALTHY after rancher-alarms started.

Here are my logs. As you can see 6 of the 10 services were marked as degraded and then UNHEALTH pretty much right on startup. Thre resulted in 6 emails being triggered. However, Rancher is showing every service as active/green.

4/11/2017 10:42:51 AM> [email protected] start /usr/src/app 4/11/2017 10:42:51 AM> node bin/rancher-alarms.js 4/11/2017 10:42:51 AM 4/11/2017 10:42:55 AM[INFO] 2017-4-11 17:42:55:112 composing config from env variables 4/11/2017 10:42:55 AM[INFO] 2017-4-11 17:42:55:125 started with config: 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:709 monitors inited: 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:710 mystack3/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:710 mystack3/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack3/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 letsencrypt/letsencrypt: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 pa/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 pa/shoryuken: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 edge/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 edge/redirect: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack5/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack5/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack5/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack1/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:711 mystack1/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack4/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack4/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 api/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 api/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 api/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack6/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack6/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack2/db: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack2/passenger: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:712 mystack2/lb: 4/11/2017 10:42:56 AM targets: "email:\n recipients: [email protected]" 4/11/2017 10:42:56 AM healthcheck: { 4/11/2017 10:42:56 AM "pollInterval": 15000, 4/11/2017 10:42:56 AM "healthyThreshold": "2", 4/11/2017 10:42:56 AM "unhealthyThreshold": "3" 4/11/2017 10:42:56 AM} 4/11/2017 10:42:56 AM 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:714 start polling mystack3/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:718 start polling mystack3/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:719 start polling mystack3/lb 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:719 start polling letsencrypt/letsencrypt 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:719 start polling pa/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:719 start polling pa/shoryuken 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:720 start polling edge/lb 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:720 start polling edge/redirect 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack5/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack5/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack5/lb 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack1/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack1/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack4/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling mystack4/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:722 start polling api/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling api/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling api/lb 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack6/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack6/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack2/db 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack2/passenger 4/11/2017 10:42:56 AM[INFO] 2017-4-11 17:42:56:723 start polling mystack2/lb 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:420 service mystack4/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:436 service mystack1/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:442 service mystack6/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:502 service mystack3/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:511 service mystack5/passenger active -> degraded 4/11/2017 10:43:12 AM[INFO] 2017-4-11 17:43:12:532 service mystack2/passenger active -> degraded 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:694 service mystack4/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:856 service mystack1/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:899 service mystack3/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:906 service mystack5/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:909 service mystack6/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:43 AM[INFO] 2017-4-11 17:43:43:963 service mystack2/passenger became UNHEALTHY with threshold 3 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:214 sending email notification to [email protected] 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:384 sending email notification to [email protected] 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:406 sending email notification to [email protected] 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:419 sending email notification to [email protected] 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:463 sending email notification to [email protected] 4/11/2017 10:43:44 AM[INFO] 2017-4-11 17:43:44:485 sending email notification to [email protected] 4/11/2017 10:43:45 AM[INFO] 2017-4-11 17:43:45:484 sent email notification to [email protected] { 4/11/2017 10:43:45 AM "accepted": [ 4/11/2017 10:43:45 AM "[email protected]" 4/11/2017 10:43:45 AM ], 4/11/2017 10:43:45 AM "rejected": [], 4/11/2017 10:43:45 AM "response": "250 2.0.0 OK 1491932625 n7sm31840855pfn.0 - gsmtp", 4/11/2017 10:43:45 AM "envelope": { 4/11/2017 10:43:45 AM "from": "[email protected]", 4/11/2017 10:43:45 AM "to": [ 4/11/2017 10:43:45 AM "[email protected]" 4/11/2017 10:43:45 AM ] 4/11/2017 10:43:45 AM }, 4/11/2017 10:43:45 AM "messageId": "[email protected]" 4/11/2017 10:43:45 AM} 4/11/2017 10:43:45 AM[INFO] 2017-4-11 17:43:45:746 sent email notification to [email protected] { 4/11/2017 10:43:45 AM "accepted": [ 4/11/2017 10:43:45 AM "[email protected]" 4/11/2017 10:43:45 AM ], 4/11/2017 10:43:45 AM "rejected": [], 4/11/2017 10:43:45 AM "response": "250 2.0.0 OK 1491932625 t5sm31763246pgb.58 - gsmtp", 4/11/2017 10:43:45 AM "envelope": { 4/11/2017 10:43:45 AM "from": "[email protected]", 4/11/2017 10:43:45 AM "to": [ 4/11/2017 10:43:45 AM "[email protected]" 4/11/2017 10:43:45 AM ] 4/11/2017 10:43:45 AM }, 4/11/2017 10:43:45 AM "messageId": "[email protected]" 4/11/2017 10:43:45 AM} 4/11/2017 10:43:46 AM[INFO] 2017-4-11 17:43:46:77 sent email notification to [email protected] { 4/11/2017 10:43:46 AM "accepted": [ 4/11/2017 10:43:46 AM "[email protected]" 4/11/2017 10:43:46 AM ], 4/11/2017 10:43:46 AM "rejected": [], 4/11/2017 10:43:46 AM "response": "250 2.0.0 OK 1491932626 r17sm31801969pfa.13 - gsmtp", 4/11/2017 10:43:46 AM "envelope": { 4/11/2017 10:43:46 AM "from": "[email protected]", 4/11/2017 10:43:46 AM "to": [ 4/11/2017 10:43:46 AM "[email protected]" 4/11/2017 10:43:46 AM ] 4/11/2017 10:43:46 AM }, 4/11/2017 10:43:46 AM "messageId": "[email protected]" 4/11/2017 10:43:46 AM} 4/11/2017 10:43:46 AM[INFO] 2017-4-11 17:43:46:426 sent email notification to [email protected] { 4/11/2017 10:43:46 AM "accepted": [ 4/11/2017 10:43:46 AM "[email protected]" 4/11/2017 10:43:46 AM ], 4/11/2017 10:43:46 AM "rejected": [], 4/11/2017 10:43:46 AM "response": "250 2.0.0 OK 1491932626 o194sm31854886pfg.66 - gsmtp", 4/11/2017 10:43:46 AM "envelope": { 4/11/2017 10:43:46 AM "from": "[email protected]", 4/11/2017 10:43:46 AM "to": [ 4/11/2017 10:43:46 AM "[email protected]" 4/11/2017 10:43:46 AM ] 4/11/2017 10:43:46 AM }, 4/11/2017 10:43:46 AM "messageId": "[email protected]" 4/11/2017 10:43:46 AM} 4/11/2017 10:43:46 AM[INFO] 2017-4-11 17:43:46:738 sent email notification to [email protected] { 4/11/2017 10:43:46 AM "accepted": [ 4/11/2017 10:43:46 AM "[email protected]" 4/11/2017 10:43:46 AM ], 4/11/2017 10:43:46 AM "rejected": [], 4/11/2017 10:43:46 AM "response": "250 2.0.0 OK 1491932626 m19sm5561930pfg.115 - gsmtp", 4/11/2017 10:43:46 AM "envelope": { 4/11/2017 10:43:46 AM "from": "[email protected]", 4/11/2017 10:43:46 AM "to": [ 4/11/2017 10:43:46 AM "[email protected]" 4/11/2017 10:43:46 AM ] 4/11/2017 10:43:46 AM }, 4/11/2017 10:43:46 AM "messageId": "[email protected]" 4/11/2017 10:43:46 AM} 4/11/2017 10:43:47 AM[INFO] 2017-4-11 17:43:47:66 sent email notification to [email protected] { 4/11/2017 10:43:47 AM "accepted": [ 4/11/2017 10:43:47 AM "[email protected]" 4/11/2017 10:43:47 AM ], 4/11/2017 10:43:47 AM "rejected": [], 4/11/2017 10:43:47 AM "response": "250 2.0.0 OK 1491932627 2sm8215793pfs.85 - gsmtp", 4/11/2017 10:43:47 AM "envelope": { 4/11/2017 10:43:47 AM "from": "[email protected]", 4/11/2017 10:43:47 AM "to": [ 4/11/2017 10:43:47 AM "[email protected]" 4/11/2017 10:43:47 AM ] 4/11/2017 10:43:47 AM }, 4/11/2017 10:43:47 AM "messageId": "[email protected]" 4/11/2017 10:43:47 AM}

iDiogenes avatar Apr 11 '17 17:04 iDiogenes

Well GREEN service in Rancher UI doesn't mean it's healthy, depends on how you setup healtchecks in affected services. If you look to API services are they really healthy ?

VAdamec avatar Apr 11 '17 18:04 VAdamec

Using the rancher cli running a ps against the environment shows every service and sidekick as healthy. What is rancher-alarms querying to check for a healthy state?

iDiogenes avatar Apr 11 '17 18:04 iDiogenes

Also, using the "view in API" from the UI is showing the same results - active and healthy.

iDiogenes avatar Apr 11 '17 18:04 iDiogenes

That's strange, it get result from API (services, see server.es6 and rancher.es6), do you have more environments ?

VAdamec avatar Apr 11 '17 18:04 VAdamec

There is only a single cattle environment on the server that is being queried.

iDiogenes avatar Apr 11 '17 18:04 iDiogenes

Ok, so please run it with debug, I need to see more than standard log

VAdamec avatar Apr 12 '17 06:04 VAdamec

I'm not familiar with trace() which is used here, but you can easily change it to info() in src/server.es6, line 32

  trace(`loaded services from API\n${JSON.stringify(services, null, 4)}`)
# just change trace to info
  info(`loaded services from API\n${JSON.stringify(services, null, 4)}`)

it will show you complete API response which is received from Rancher. And run it from shell:

export RANCHER_ACCESS_KEY=..
...
npm start

VAdamec avatar Apr 12 '17 07:04 VAdamec

@VAdamec - I found the issue and it does have to do with the version of Rancher. The _withoutSidekicks function does a split on the container name using an underscore. In Rancher 1.2 (I believe, could be 1.3) they changed the sidekicks to be separated by a hyphen. I updated my code locally to use the hyphen and it solved my problem. Not sure how you want to address the issue, but a fix that supports both formats would be recommended.

https://github.com/ndelitski/rancher-alarms/blob/master/src/monitor.es6#L195

iDiogenes avatar Apr 12 '17 19:04 iDiogenes

Ok, it's seems to be easy fix, do you create PR ? we see If and when @ndelitski accept it

VAdamec avatar May 23 '17 14:05 VAdamec