paasta icon indicating copy to clipboard operation
paasta copied to clipboard

Add check_flink_pods_running script

Open catlynk opened this issue 5 years ago • 3 comments

Testing done:

  1. make test
  2. manual test: https://fluffy.yelpcorp.com/i/40HFkZSmWF1zVqkF3S4lHbfXBzWSv6cB.html

I'm treating pods of different subcomponent in the same way and sending one sensu event as long as there's any pod in the service.instance that's not in running state. Don't see much benefit for splitting them up considering the actions to take when getting this alert is pretty much the same (describe the pod and see why it's in pending state). But I'm happy to split them if you'd prefer

catlynk avatar Feb 04 '20 23:02 catlynk

@catlynk would you mind holding this up one more day so that I get the chance to speak with some people? I know that data-streams-core wants to be notified for pending pods, too, so it might good to make it generic.

poros avatar Feb 05 '20 17:02 poros

@poros Sure thing, I'll hold off on pushing this for now

catlynk avatar Feb 05 '20 18:02 catlynk

@catlynk I talked with @poros about your questions, and I'm not 100% sure why we'd need to configure anything in yelpsoa-configs. For check_flink_services_health we configure it in Puppet here: https://sourcegraph.yelpcorp.com/sysgit/puppet@master/-/blob/modules/profile_kube/manifests/master.pp#L652, would that not work for this check? If somebody wants to use it for something else (e.g. Kafka), they can just duplicate that configuration.

As for services, I don't think this check actually needs to be service-aware at all, since it just needs to let stream-processing know if any Flink pod, from any service, is Pending for a while.

I hope this is useful, if I've misunderstood your questions then please let me know.

JoeMalt avatar Feb 12 '20 12:02 JoeMalt

Cleaning up and closing some very old PRs. Please re-open or nudge me if you’re still planning to work on this.

mattmb avatar Feb 21 '24 11:02 mattmb