paasta
paasta copied to clipboard
Add check_flink_pods_running script
Testing done:
- make test
- manual test: https://fluffy.yelpcorp.com/i/40HFkZSmWF1zVqkF3S4lHbfXBzWSv6cB.html
I'm treating pods of different subcomponent in the same way and sending one sensu event as long as there's any pod in the service.instance that's not in running state. Don't see much benefit for splitting them up considering the actions to take when getting this alert is pretty much the same (describe the pod and see why it's in pending state). But I'm happy to split them if you'd prefer
@catlynk would you mind holding this up one more day so that I get the chance to speak with some people? I know that data-streams-core wants to be notified for pending pods, too, so it might good to make it generic.
@poros Sure thing, I'll hold off on pushing this for now
@catlynk I talked with @poros about your questions, and I'm not 100% sure why we'd need to configure anything in yelpsoa-configs. For check_flink_services_health we configure it in Puppet here: https://sourcegraph.yelpcorp.com/sysgit/puppet@master/-/blob/modules/profile_kube/manifests/master.pp#L652, would that not work for this check? If somebody wants to use it for something else (e.g. Kafka), they can just duplicate that configuration.
As for services, I don't think this check actually needs to be service-aware at all, since it just needs to let stream-processing know if any Flink pod, from any service, is Pending for a while.
I hope this is useful, if I've misunderstood your questions then please let me know.
Cleaning up and closing some very old PRs. Please re-open or nudge me if you’re still planning to work on this.