Rob Johnson
Rob Johnson
separate `paasta_metastatus` into a lib and api, with the cli entrypoint just there to provide a nicer interface.
Marathon seems to lose the Healthcheck actor for a given task occasionally, leaving it in an 'unknown' state. Let's actively go after these in the bounce and replace them where...
If the autoscaler would have scaled up the instance count normally, but hasn't been able to because of the `max_instances` limit, it would be good to send a sensu event...
@mattmb following up on our chat yesterday: - The cluster autoscaler currently looks at usage per slave - it doesn't make any decisions based on 'what' the tasks are. -...
I'm proposing that when a Job fails (either through the run failing, or the scheduler not running it as expected), we should notify all the owners of downstream jobs, too....
we only check on the number of mesos masters - we should add a check that asserts that the number of Marathon/Chronos masters too
``` robj@paasta56-r5-sfo2:~ % sudo paasta_maintenance status paasta56-r5-sfo2.prod.yelpcorp.com 10-64-133-242-uswest1bprod.prod.yelpcorp.com (10.64.133.242): Draining paasta56-r5-sfo2.prod.yelpcorp.com (10.44.5.33): Draining ```
they make the log super noisy - lets make kazoo quieter.
we often need to answer 'how much capacity do we have in pool X, region Y'. That's kind of difficult right now (you can do ``paasta metastatus -g region pool``...