venice
venice copied to clipboard
[router] Add MetricEntityStateGeneric for flexible dimension handling for OTel
Problem Statement
Existing subclasses of MetricEntityState are for specific cases with specific number of dimensions which should be enums that are perf and GC optimized by reusing Attributes rather than recreate it everytime. But for cases like controllers where the metric emission is infrequent and is not processing tens of thousands or more of QPS, it would benefit to have a generic non cached version of a subclass rather than creating multiple new subclasses for all combination of dimensions.
Solution
- Introduce
MetricEntityStateGenericwhich takes inbaseDimensionsMapduring init and all remaining dimensions as aMapduringrecord()call where it validates the data for all required dimensions. This can be reused for most of the metrics in controllers. This will also help in code maintainability by reducing some of the subclasses needed. - Unlike other subclasses which are typesafe and most of the checks are compile time, this class will have some of the checks during runtime, thus can fail during runtime. The failures are logged with a redundant log filter and are capture in an internal metric:
venice.internal.metric_record_failure. - This can't be used for 0 dynamic dimensions case and it will throw an error during initialization.
MetricEntityStateBaseshould be used instead.
Code changes
- [ ] Added new code behind a config. If so list the config names and their default values in the PR description.
- [x] Introduced new log lines.
- [x] Confirmed if logs need to be rate limited to avoid excessive logging.
Concurrency-Specific Checks
Both reviewer and PR author to verify
- [x] Code has no race conditions or thread safety issues.
- [ ] Proper synchronization mechanisms (e.g.,
synchronized,RWLock) are used where needed. - [ ] No blocking calls inside critical sections that could lead to deadlocks or performance degradation.
- [ ] Verified thread-safe collections are used (e.g.,
ConcurrentHashMap,CopyOnWriteArrayList). - [ ] Validated proper exception handling in multi-threaded code to avoid silent thread termination.
How was this PR tested?
- [x] New unit tests added.
- [ ] New integration tests added.
- [ ] Modified or extended existing tests.
- [ ] Verified backward compatibility (if applicable).
Does this PR introduce any user-facing or breaking changes?
- [x] No. You can skip the rest of this section.
- [ ] Yes. Clearly explain the behavior change and its impact.