aibrix icon indicating copy to clipboard operation
aibrix copied to clipboard

Issue with metric refresh interval

Open varungup90 opened this issue 1 month ago • 1 comments

🐛 Describe the bug

I noticed that metrics are not refreshed correctly per the interval. In below logs, interval set is 1s, but for decode-0 pod, metric is refreshed twice for same interval.

I1029 23:28:23.046741       1 cache_metrics.go:453] "Updating model metric" pod="vllm-2p2d-tp2-1roleset-roleset-wk7x7-decode-5dcf575575-0" model="qwen3-32b" generation_token_total={"Value":1315505} avg_generation_throughput_toks_per_s={"Value":1217.4834983307785}
I1029 23:28:23.046754       1 cache_metrics.go:186] 


I1029 23:28:23.049670       1 gateway_rsp_body.go:157] request end, requestID: fead47f2-39d2-4bf7-ada8-a3e9c67a21d2 - targetPod: 192.168.0.102:8000, elapsed: 6.59526097s
I1029 23:28:23.089137       1 cache_metrics.go:453] "Updating model metric" pod="vllm-2p2d-tp2-1roleset-roleset-wk7x7-decode-5dcf575575-0" model="qwen3-32b" generation_token_total={"Value":1315565} avg_generation_throughput_toks_per_s={"Value":1415.1147987115946}
I1029 23:28:23.089151       1 cache_metrics.go:186] 


I1029 23:28:23.089311       1 cache_metrics.go:453] "Updating model metric" pod="vllm-2p2d-tp2-1roleset-roleset-wk7x7-decode-5dcf575575-1" model="qwen3-32b" generation_token_total={"Value":1347888} avg_generation_throughput_toks_per_s={"Value":1370.1406205634362}

Steps to Reproduce

I added a log message in cache_metrics.go to print the values and run benchmark test.

Expected behavior

there should be only one record for each pod for every metric in the interval

Environment

NA

varungup90 avatar Oct 29 '25 23:10 varungup90

Could this be a problem with duplicate pods in metaPods? 🤔

googs1025 avatar Oct 30 '25 09:10 googs1025