Saheli Bhattacharjee

Results 5 issues of Saheli Bhattacharjee

For ``, resnet50 Offline was inferred from SingleStream scenario, the results generated by `submission_checker.py` are shown below. The power efficiency calculated is `power_efficiency (inf/J) = 3448436.324053`, which is indeed a...

This PR will add below metrics to V1, Metric Name | Type | Unit -- | -- | -- model_load_time | Provided as a log for CUDA devices | Seconds

v1

This PR is an updated and improved version of PR #12627. Please see some discussion there.

speculative-decoding
v1

This PR deprecates metrics with `gpu_` prefix for existing non-GPU specific metrics- - `gpu_cache_usage` - `gpu_prefix_cache_queries` - `gpu_prefix_cache_hits` and, introduce new metrics after renaming- - `kv_cache_usage` - `prefix_cache_queries` - `prefix_cache_hits`

v1