samza icon indicating copy to clipboard operation
samza copied to clipboard

SAMZA-2582: Add a metric to track container failure tracking metric for Samza

Open Sanil15 opened this issue 4 years ago • 2 comments

Changes: Added a metric to -failure-count to track failure count of a single container

API Changes: None

Tests: Tested the change with a yarn job deploy

Upgrade Instructions: None

Usage Instructions: None

Sanil15 avatar Aug 14 '20 18:08 Sanil15

Can you please update the PR description to include what issue/symptom you are fixing with this? It's unclear why you need a container failure metric for each container specifically instead of an aggregate container failure metric.

cameronlee314 avatar Aug 17 '20 18:08 cameronlee314

@cameronlee314 this is needed to be able to track individual container health issues and make informed ops decisions based on that data, this is useful for both containers with host affinity to detect unstable hosts, as well as containers w/o affinity - to detect issues caused by partitioning (i.e. when specific traffic goes to certain containers and causes instability from time to time).

f3flight avatar Aug 18 '20 01:08 f3flight