enterprise_gateway
enterprise_gateway copied to clipboard
Metric Collection and Montioring
Hi,
We have been trying to use JEG in our Production systems and with time it is becoming increasingly necessary for us to collect and monitor the metric around the kernels being spawned, users using them, and the kind of requests being made to the JEG servers.
This being said, can we receive any guidance on how we should proceed with the aggregation of these metrics from the JEG servers for it be logged for monitoring purposes. To start with, we are thinking of incorporating the collection and monitoring of these metrics through "STATSD" library. https://pypi.org/project/pystatsd/
I am not sure if this really qualifies to be an issue, but this surely can be a feature add with this being the starting point.
We are looking to collect following generic information around the setup.
- Average number of active kernels per user.
- Total number of active kernels.
- Number of active users.
- Number of active kernels per OS type (Client OS).
RPS on JEG.
- Kernel launch requests
- Refresh/reconnect requests
- Get kernel/kernelspec requests
- Shutdown/restart kernel requests etc.
@IMAM9AIS - this would be fantastic! This seems to imply that we'd want to have our own handlers in place since some of these probably warrant updates to those locations - although I suppose that could be a discussion point.
With the persistent kernel session stuff, we already track kernels per user and can get total active kernels and users.
I don't know how much overlap there is with pystatsd, but I think it would be good to take a look at the telemetry stuff (event logging) that is underway in a couple other Jupyter projects (Hub and Lab) from a synergy perspective. On the surface, that appears to be more of an auditing thing than metrics. That said, there are other metric pieces (via prometheus) in place in various projects as well. I just want to make sure we're not adding yet another framework to the ecosystem when others exist and are adequate for our needs.
I hope that's helpful.
there are other metric pieces (via prometheus) in place in various projects as well.
I love prometheus, too :+1:
@kevin-bates @esevan Sounds good.
We actually came across this PR that was added to notebook server to use Prometheus to push metrics.
https://github.com/jupyter/notebook/pull/3490
However, while using JEG, this PR does not seem to be enabled in JEG. We are trying to understand if we can actually use this PR to extend our solution and add more metrics to this.
If you move to the master branch (where we've removed EG's dependency on Kernel Gateway), you should have the ability to get the /metrics
endpoint exposed. I suspect this would consist of the similar approach used in https://github.com/jupyter/enterprise_gateway/blob/master/enterprise_gateway/base/handlers.py where the various mixins get added into the class derivation and the handler then essentially derives from Notebook's PrometheusMetricsHandler
- similar to all the other handlers.