samza
samza copied to clipboard
SAMZA-2582: Add a metric to track container failure tracking metric for Samza
Changes: Added a metric to
API Changes: None
Tests: Tested the change with a yarn job deploy
Upgrade Instructions: None
Usage Instructions: None
Can you please update the PR description to include what issue/symptom you are fixing with this? It's unclear why you need a container failure metric for each container specifically instead of an aggregate container failure metric.
@cameronlee314 this is needed to be able to track individual container health issues and make informed ops decisions based on that data, this is useful for both containers with host affinity to detect unstable hosts, as well as containers w/o affinity - to detect issues caused by partitioning (i.e. when specific traffic goes to certain containers and causes instability from time to time).