At least the metric `apollo_router_session_count_total` is violating Prometheus naming best-practices
Describe the bug
The metric apollo_router_session_count_total is reporting
Number of currently connected clients (https://www.apollographql.com/docs/router/configuration/telemetry/instrumentation/standard-instruments/#session)
so it's a gauge as it's increasing and decreasing with the number of connections.
According to the Prometheus metric naming best-practice (and thus consumer expectations) this should NOT have the total suffix as this is documented to be used for accumulating counts only:
Note that an accumulating count has total as a suffix, in addition to the unit if applicable.
Thanks for opening this. We agree. Unfortunately, this is one of our oldest metrics (perhaps one of the first!).
We admit we made a mistake with its naming, but changing it at this point would be a breaking change. Overall, we're trying to replace them with new better metrics that meet the requirements of Prometheus and OpenTelemetry naming conventions and best practices, but that's one at a time. We do believe that we have moved to a "good" place. We have this tracked internally as part of a different initiative, but this particular metric won't change until a major version release as we don't want to just emit twice the metrics unnecessarily either.
Hope this helps explain where we are on it! Let us know if you have any other thoughts.
For what it's worth, in all the dashboards I've ever seen this on, I don't actually personally get a ton of value out of this metric versus just general request-based metrics that the router provides. That's particularly true since other infrastructure usually provides that insight in a more uniform way across a wider variety of products (e.g., Istio/Envoy/Linkerd/proxies usually cover this).