prometheus-haskell
prometheus-haskell copied to clipboard
Registry implementation allows for multiple distinct registrations of the same metric
Currently, the []
implementation of Registry
allows for multiple registrations of the same metric. I can't really think of a use-case where someone would want to have duplicate metrics (vs an idempotent register
function). In fact, this behaviour being allowed can have negative consequences. Consider the following stripped down variant of Example/main.hs:
-- This example demonstrates how to use the prometheus-haskell libraries to
-- instrument a simple web app that allows users to vote for their favorite
-- color.
{-# LANGUAGE OverloadedStrings #-}
module Main
where
import Network.HTTP.Types (status200)
import Network.HTTP.Types.Header (hContentType)
import Network.Wai.Handler.Warp (run)
import qualified Data.ByteString.Lazy as LBS
import qualified Network.Wai as Wai
import qualified Network.Wai.Middleware.Prometheus as P
import qualified Prometheus as P
{-# NOINLINE pageVisits #-}
pageVisits :: IO P.Counter
pageVisits = P.register
$ P.counter
-- Each metric provided by the base library takes an Info value that
-- gives the name of the metric and a help string that describes the
-- value that the metric represents.
$ P.Info "page_visits" "The number of visits to the index page."
main :: IO ()
main = do
let port = 3000
putStrLn $ "Listening at http://localhost:" ++ show port ++ "/"
-- Instrument the app with the prometheus middlware using the default
-- `PrometheusSettings`. This will cause the app to dump the metrics when
-- the /metrics endpoint is accessed.
run port (P.prometheus P.def app)
app :: Wai.Application
app request respond = do
response <- case Wai.pathInfo request of
[] -> doIndex
_ -> return $ mkResponse "404"
respond response
mkResponse :: LBS.ByteString -> Wai.Response
mkResponse = Wai.responseLBS status200 [(hContentType, "text/html")]
doIndex :: IO Wai.Response
doIndex = do
pageVisitsMetric <- pageVisits
P.incCounter pageVisitsMetric
return $ mkResponse $ LBS.concat [
"<a href='/metrics'>Metrics</a>"
, "<br><br><br>"
]
Since I've made the unobvious 'mistake' of registering metrics in my handler, which is called per-request, the list of metrics grows every time my server handles a request:
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
# TYPE http_request_duration_seconds histogram
... http stuff
One consequence of this is that a request to /metrics
on the server may end up timing out because the metrics response size keeps on growing, so you lose visibility over the app until it restarts or the metrics are cleaned up. I think it would be valuable to make it difficult for consumers to do the 'wrong' thing inadvertently.
I can see two solutions to this, and I'm happy to work on a PR to implement either.
- Change the underlying data structure of
Registry
to use aSet
or aMap
, so thatregister
becomes idempotent. If desired, variants ofregister
can be provided so that it is either idempotent or crashes when a metric is 're-registered'. I'm inclined to provide just the former to be honest. - Change the implementation of
register
to do an 'insert if not exists' sort of thing. This then incurs a cost every timeregister
is called, and doesn't force other implementations ofregister
to not provide this behaviour.
Thoughts?
Yeah, I don't think there is a good reason to have duplicate metrics. I think using a set or map and ignoring subsequent calls makes sense. It would be nice if we could warn the caller that they are probably doing something wrong, but crashing seems a bit drastic.
I would be happy to take a PR that updates this.
Alright will take a look. Cheers