prometheus-haskell Registry implementation allows for multiple distinct registrations of the same metric

Currently, the [] implementation of Registry allows for multiple registrations of the same metric. I can't really think of a use-case where someone would want to have duplicate metrics (vs an idempotent register function). In fact, this behaviour being allowed can have negative consequences. Consider the following stripped down variant of Example/main.hs:

-- This example demonstrates how to use the prometheus-haskell libraries to
-- instrument a simple web app that allows users to vote for their favorite
-- color.
{-# LANGUAGE OverloadedStrings #-}
module Main
where

import           Network.HTTP.Types                (status200)
import           Network.HTTP.Types.Header         (hContentType)
import           Network.Wai.Handler.Warp          (run)

import qualified Data.ByteString.Lazy              as LBS
import qualified Network.Wai                       as Wai
import qualified Network.Wai.Middleware.Prometheus as P
import qualified Prometheus                        as P


{-# NOINLINE pageVisits #-}
pageVisits :: IO P.Counter
pageVisits = P.register
           $ P.counter
           -- Each metric provided by the base library takes an Info value that
           -- gives the name of the metric and a help string that describes the
           -- value that the metric represents.
           $ P.Info "page_visits" "The number of visits to the index page."

main :: IO ()
main = do
    let port = 3000
    putStrLn $ "Listening at http://localhost:" ++ show port ++ "/"
    -- Instrument the app with the prometheus middlware using the default
    -- `PrometheusSettings`. This will cause the app to dump the metrics when
    -- the /metrics endpoint is accessed.
    run port (P.prometheus P.def app)

app :: Wai.Application
app request respond = do
    response <- case Wai.pathInfo request of
        [] -> doIndex
        _  -> return $ mkResponse "404"
    respond response

mkResponse :: LBS.ByteString -> Wai.Response
mkResponse =  Wai.responseLBS status200 [(hContentType, "text/html")]

doIndex :: IO Wai.Response
doIndex = do
    pageVisitsMetric <- pageVisits
    P.incCounter pageVisitsMetric
    return $ mkResponse $ LBS.concat [
            "<a href='/metrics'>Metrics</a>"
        ,   "<br><br><br>"
        ]

Since I've made the unobvious 'mistake' of registering metrics in my handler, which is called per-request, the list of metrics grows every time my server handles a request:

# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP page_visits The number of visits to the index page.
# TYPE page_visits counter
page_visits 1.0
# HELP http_request_duration_seconds The HTTP request latencies in seconds.
# TYPE http_request_duration_seconds histogram
... http stuff

One consequence of this is that a request to /metrics on the server may end up timing out because the metrics response size keeps on growing, so you lose visibility over the app until it restarts or the metrics are cleaned up. I think it would be valuable to make it difficult for consumers to do the 'wrong' thing inadvertently.

I can see two solutions to this, and I'm happy to work on a PR to implement either.

Change the underlying data structure of Registry to use a Set or a Map, so that register becomes idempotent. If desired, variants of register can be provided so that it is either idempotent or crashes when a metric is 're-registered'. I'm inclined to provide just the former to be honest.
Change the implementation of register to do an 'insert if not exists' sort of thing. This then incurs a cost every time register is called, and doesn't force other implementations of register to not provide this behaviour.

Thoughts?

Apr 05 '20 14:04 MIJOTHY

Yeah, I don't think there is a good reason to have duplicate metrics. I think using a set or map and ignoring subsequent calls makes sense. It would be nice if we could warn the caller that they are probably doing something wrong, but crashing seems a bit drastic.

I would be happy to take a PR that updates this.

Apr 18 '20 17:04 fimad

Alright will take a look. Cheers

Apr 19 '20 15:04 MIJOTHY

prometheus-haskell prometheus-haskell copied to clipboard

Registry implementation allows for multiple distinct registrations of the same metric

prometheus-haskell
prometheus-haskell copied to clipboard