ray
ray copied to clipboard
Add API latency and call counts metrics to dashboard APIs
Signed-off-by: Alan Guo [email protected]
Why are these changes needed?
Adds basic latency and call count metrics for dashboard API endpoints. This willl allow us to more easily debug issues where the dashboard apis are unresponsive or slow.
Related issue number
Checks
- [x] I've signed off every commit(by using the -s flag, i.e.,
git commit -s
) in this PR. - [x] I've run
scripts/format.sh
to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
- [ ] Unit tests
- [ ] Release tests
- [ ] This PR is not tested :(
@rkooo567 @rickyyx , simple PR to add prom metrics to dashboard apis. It should work but I'm not seeing the custom metrics in the metrics export port.
Any ideas what's wrong? I do see the middleware is being called because I see the debug statements.
Also, instead of using ray custom metrics, should I be using CythonHistogram / CythonCounter directly so that this is a "default metric" rather than a "custom metric"?
Yeah, I was able to reproduce what you described on my end as well
Does python ray/doc/source/ray-observability/doc_code/metrics_example.py
produce the desired metrics for you? I am seeing the example working.
Can you update the PR description?
Lmk when tests pass + addressed comments!
@richardliaw @ericl ready for re-review