popeye icon indicating copy to clipboard operation
popeye copied to clipboard

Add support for explicitly sanitizing jobs to popeye

Open ndavidson-pulse opened this issue 5 years ago • 2 comments




Is your feature request related to a problem? Please describe. When trying to run a popeye check on my cluster, I get frustrated that pods belonging to failed iterations of jobs that eventually succeed are flagged as failures by popeye.

Describe the solution you'd like Optionally check for the final success of the job rather than if all iterations of the job succeeded.

Describe alternatives you've considered We currently white-list these failures, but that'll cause a genuine job failure to be missed.

Additional context If this is a feature Popeye wants, we could develop it and then contributing it back it to Popeye. I had a search your github issues, including closed issues and I don't think you've explicitly rejected a proposal like this before.

Thanks very much for Popeye, it's a very useful tool.

ndavidson-pulse avatar Apr 18 '20 08:04 ndavidson-pulse

@ndavidson-pulse Thank you for this issue! I'll need to take a peek and see if we can devise a different approach with job failures. Alternatively if you don't care about the failure history on your cron you can use spec.failedJobHistoryLimits=0. Defaults to 1.

derailed avatar Apr 25 '20 22:04 derailed

@derailed It's not that we don't care - we do want them to succeed but just within a specified window. We wrap popeye in a script that deploys our cluster from scratch and waits for the cluster to stabilize within a defined time-limit and then check, By this point all the jobs have succeeded but a couple of them may have failed first. It's fairly random and depends on exactly how quickly core services come up.

ndavidson-pulse avatar Apr 27 '20 08:04 ndavidson-pulse

Fixed v0.20.0

derailed avatar Feb 17 '24 22:02 derailed