kong Error in getting metrics & prometheus plugin after bumping to the 3.7.1 release

Is there an existing issue for this?

[x] I have searched the existing issues

Kong version (`$ kong version`)

3.7.1 / 3.9.0

Current Behavior

I am having problems with metrics & prometheus plugin after bumping to the 3.7.1 release. (I already bumped Kong until 3.9.0 and the issue still persists)

I have the following entry in my logs: [lua] prometheus.lua:1020: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, client: 10.145.40.1, server: kong_status, request: "GET /metrics HTTP/1.1", host: "10.145.12.54:8100"

Interesting facts:

it is always the same service and that route has that error. In my case it is always the same two routes for the same bucket.
When we revert back to 3.6.1 and the problem goes away
After a few months, we bumped Kong to version 3.9.0 and the problem started happening again after a couple of hours for the same routes + buckets.
Goes away with a pod rotation but comes back after a while.

I already tried:

nginx_http_lua_shared_dict: 'prometheus_metrics 15m' Memory now stands at +-20%

One pod contains:

kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="50"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="100"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="250"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="400"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="700"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="1000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="2000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="5000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="+Inf"} 2

While another is missing the le "80"

kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="50"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="100"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="250"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="400"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="700"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="1000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="2000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="5000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="+Inf"} 3

We are running our Kong in AWS EKS, upgraded from 3.6.1

Expected Behavior

The bucket should not disappear, but if it does for any reason I would expect Kong to be able to recover from an inconsistent state. (maybe metric reset?)

Steps To Reproduce

No response

Anything else?

No response

Jan 14 '25 11:01 rodolfobrunner

@rodolfobrunner Does this issue happen while using the same deployment as https://github.com/Kong/kong/issues/14144? I'm trying to reproduce it.

Jan 17 '25 01:01 ProBrian

@rodolfobrunner Does this issue happen while using the same deployment as #14144? I'm trying to reproduce it.

Hey @ProBrian, I am part of the same team as @rodolfobrunner. Yes, it's the same deployment.

Jan 20 '25 10:01 brunomiguelsantos

Hello @ProBrian some additional info on what we're seeing. We added at some point some debug instructions to figure out what was being stored. Something like:

-- Adapted from the prometheus metric_data function
local function collect()
  ngx.header["Content-Type"] = "text/plain; charset=UTF-8"
  ngx.header["Kong-NodeId"] = node_id

  local prometheus = exporter.get_prometheus()

  local write_fn = ngx.print

  local keys = prometheus.dict:get_keys(0)   <--- this is the ngx.shared["prometheus_metrics"]
  local count = #keys

  table_sort(keys)

  local output = buffer.new(DATA_BUFFER_SIZE_HINT)
  local output_count = 0

  local function buffered_print(fmt, ...)
    if fmt then
      output_count = output_count + 1
      output:putf(fmt, ...)
    end

    if output_count >= 100 or not fmt then
      write_fn(output:get())  -- consume the whole buffer
      output_count = 0
    end
  end

  for i = 1, count do
    local key = keys[i]
    local value = prometheus.dict[key]

    buffered_print("%s: %s\n", key, value)
  end

  buffered_print(nil)

  output:free()
end

... which outputs (when the error occurs):

request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00050.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00100.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00250.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00400.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00700.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="01000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="02000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="05000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="Inf"} null

How can a dictionary support duplicate keys? Even if it's a shared dictionary?

Feb 07 '25 16:02 jmadureira

request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <---------- request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <---------- request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00050.0"} null request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00100.0"} null

That's weird, from the print logs even the table_sort works abnormally, "00080.0" appears before "00050.0".

Feb 17 '25 05:02 ProBrian

Hello @ProBrian some more information. Added more information to the introspection and got this.

request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} - local_metrics: nil shared_dict: nil  <---- the important dict
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00100.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00250.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00400.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00700.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="01000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="02000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="05000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000.0"} - local_metrics: nil shared_dict: 1

Which explains the error described earlier (the value is indeed nil).

[lua] prometheus.lua:1020: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, client: 10.145.40.1, server: kong_status, request: "GET /metrics HTTP/1.1", host: "10.145.12.54:8100"

The disturbing part here is that the code that dumps the counters to the nginx shared dict:

local function sync(_, self)
  local err, _
  local ok = true
  for k, v in pairs(self.increments) do
    _, err, _ = self.dict:incr(k, v, 0)
    if err then
      ngx.log(ngx.WARN, "error increasing counter in shdict key: ", k, ", err: ", err)
      ok = false
    end
  end

  clear_tab(self.increments)
  return ok
end

the self.dict:incr(k, v, 0) according to the documentation:

Increments the (numerical) value for key in the shm-based dictionary ngx.shared.DICT by the step value value. Returns the new resulting number if the operation is successfully completed or nil and an error message otherwise.

When the key does not exist or has already expired in the shared dictionary,

if the init argument is not specified or takes the value nil, this method will return nil and the error string "not found", or if the init argument takes a number value, this method will create a new key with the value init + value.

Under which scenarios will this operation write nil?

Feb 17 '25 14:02 jmadureira

@jmadureira For the ngx.shared.DICT.incr(k,v), v will only take a number, so there should be no way incr write an nil, I started to doubt if there is issue in this API. By the way, for your debug function, seems the dict value get is not correct, could you fix it and also check if there are err message got when getting a nil value from shared dict?


for i = 1, count do
    local key = keys[i]
 - local value = prometheus.dict[key]
 + local value, err = prometheus.dict:get(key)
  -- if `err` is not nil, try to print out the err here

    buffered_print("%s: %s\n", key, value)
  end

Feb 17 '25 15:02 ProBrian

@jmadureira Could you do more debug to clarify that:

are there warning logs to indicate write error when doing self.dict:incr(k, v, 0) operation.
could you use ngx.shared.DICT.free_space to check if the shared dict still has free memory for the metrics. What I'm wondering is that is there a opportunity that some key(like the le=80 slot) has been evicted due to memory insufficient and LRU logic. That's keys = get_keys() --> keys[i] evicted due to LRU --> get(keys[i]) got an nil. so we can try another get_keys right after the get value iteration like:

local keys = prometheus.dict:get_keys(0)   <--- this is the ngx.shared["prometheus_metrics"]
...
 --- the loop of get value
  for i = 1, count do
    local key = keys[i]
    local value = prometheus.dict:get(key)

    buffered_print("%s: %s\n", key, value)
  end

--- get keys again to verify if the keys are identical, this helps us figure out if there are keys evicted due to the loop of get value.
keys = prometheus.dict:get_keys(0)

Feb 18 '25 02:02 ProBrian

@ProBrian you are probably on to something. I changed the sync logic a little bit

local function sync(_, self)
  local err, _
  local ok = true
  local count = 0
  local counter_logs = {}

  for k, v in pairs(self.increments) do
    new_val, err, _ = self.dict:incr(k, v, 0)
    if err then
      ngx.log(ngx.WARN, "error increasing counter in shdict key: ", k, ", err: ", err)
      ok = false
    end
    count = count + 1

    -- Only log counter names that contain "request_latency_ms_bucket" along with their values
    if string.find(k, "request_latency_ms_bucket", 1, true) then
      table.insert(counter_logs, k .. "=old_value+" .. v .. "=" .. new_val .." current value=" .. self.dict:get(k)) <---- also print the current value on the shared dictionary
    end
  end

  if count > 0 then
    if #counter_logs > 0 then
      ngx.log(ngx.INFO, "Synced ", count, " counters from worker ", self.id, ": ", table.concat(counter_logs, ", "))
    else
      ngx.log(ngx.INFO, "Synced ", count, " counters from worker ", self.id)
    end
  end

  clear_tab(self.increments)
  return ok
end

The error no longer shows up, most likely because since the value gets read immediately it won't be affected by the LRU logic.

On the other hand if the le=80 slot had been evicted due to memory constraints, the prometheus exporter wouldn't pick it up and no error would be logged. The metric_data function on the prometheus.lua loops through all keys on the shared dict.

Feb 18 '25 10:02 jmadureira

The error no longer shows up, most likely because since the value gets read immediately it won't be affected by the LRU logic.

I think incr and get both make a key fresh, if we already incr a specific key "le=xx" in current sync, it should not be evicted(so get(le=xx) immediately again is not the reason the key not evicted). I feel like It's more possible to be a concurrent + timing issue like:

coroutine1(metric_data): get_keys got [le=80, le=100,...] -- looping keys : get(le=**) ok, ..., get(le=100) ok, get(le=80) nil  ----
coroutine2(sync):          -----------------------------    incr value may evict "le=80" which is old ---------

So when eviction of key le=80 just happen between get_keys of metric_data and get(le=80) of sync, you got a nil. When you add something in sync, that might just changed the timing, e.g. eviction now happens before get_keys, so you don't even get the "le=80" key in keys, and therefore in the loop of keys no get(le=80) invoked and no error happened. That's why I suggested adding another get_keys right after dict looping in metric_data and see if there are keys missing comparing to the get_keys before dict looping, that would help us confirm if this is the problem.

Feb 19 '25 01:02 ProBrian

@ProBrian another round of debugging produced the following messages:

2025/02/20 13:56:56 ... counter.lua:65: Synced 46 counters from worker nil: request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}=old_value+1=1 current value=nil, ...
	
2025/02/20 13:57:04 ... prometheus.lua:937: metric_data(): Going to export 5559 shared metrics, ... request: "GET /metrics ...

2025/02/20 13:57:04 ... prometheus.lua:1027: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, ... request: "GET /metrics ...

2025/02/20 13:57:04 ... prometheus.lua:1008: metric_data(): Expected to have exported 5559 shared metrics, ... request: "GET /metrics ...

The current_value logged on the first line is supposed to be the current value on the shared cache right after changing it so I wouldn't expect it to be nil. Instead it should be 1 since that is the value returned by self.dict:incr(k, v, 0). The rest is your suggestion to check the number of keys before and after the dict_loop on the metric_data function which remains the same.

Since this custom log message I'm including the code as well.

On the counter.lua if have:

local function sync(_, self)
  local err, _
  local ok = true
  local count = 0
  local counter_logs = {}

  for k, v in pairs(self.increments) do
    new_val, err, _ = self.dict:incr(k, v, 0)
    if err then
      ngx.log(ngx.WARN, "error increasing counter in shdict key: ", k, ", err: ", err)
      ok = false
    end
    count = count + 1

    -- Only log counter names that contain "request_latency_ms_bucket" along with their values
    if string.find(k, "request_latency_ms_bucket", 1, true) then
      local k_str = k or "nil"
      local v_str = v or "nil"
      local new_val_str = new_val or "nil"
      local current_value = self.dict:get(k) or "nil"
      table.insert(counter_logs, k_str .. "=old_value+" .. v_str .. "=" .. new_val_str .." current value=" .. current_value)
    end
  end

  if count > 0 then
    if #counter_logs > 0 then
      ngx.log(ngx.INFO, "Synced ", count, " counters from worker ", self.id, ": ", table.concat(counter_logs, ", "))
    else
      ngx.log(ngx.INFO, "Synced ", count, " counters from worker ", self.id)
    end
  end

  clear_tab(self.increments)
  return ok
end

On the prometheus.lua I made the following changes:

function Prometheus:metric_data(write_fn, local_only)
  -- ...
  local count = #keys
  ngx_log(ngx.INFO, "Going to export " .. count .. " shared metrics")
  for k, v in pairs(self.local_metrics) do
    keys[count+1] = k
    count = count + 1
  end

  -- ...

  for i = 1, count do
    yield()

    -- ...
  end

  keys = self.dict:get_keys(0)
  local count = #keys
  ngx_log(ngx.INFO, "Expected to have exported " .. count .. " shared metrics")

  buffered_print(nil)

  output:free()
end

Feb 20 '25 15:02 jmadureira

2025/02/20 13:56:56 ... counter.lua:65: Synced 46 counters from worker nil: request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}=old_value+1=1 current value=nil, ...

Emm... That makes me feel suspicious about the ngx.shared.DICT.incr API. I think if that API works well, the get should not get a nil right after the incr. Could you also record the err of get if exists? Due to the document of openresty:

Retrieving the value in the dictionary ngx.shared.DICT for the key key. If the key does not exist or has expired, then nil will be returned. In case of errors, nil and a string describing the error will be returned.

So for the nil get, we could confirm whether newly incr key is evicted again, or there's something else wrong happen in get function. the change could be like:

      local new_val_str = new_val or "nil"
      local current_value, err = self.dict:get(k)
      table.insert(counter_logs, k_str .. "=old_value+" .. v_str .. "=" .. new_val_str .." current value=" .. tostring(current_value) .. " err: " .. err)

By the way, how about the memory size you set for the shm? As @rodolfobrunner mentioned

I already tried:

nginx_http_lua_shared_dict: 'prometheus_metrics 15m' Memory now stands at +-20%

Are you still using that memory size config in current test?

Feb 21 '25 02:02 ProBrian

@ProBrian another round of tests

2025/02/25 16:23:41 ... counter.lua:71: Worker: nil, initial_free_memory: 13557760, final_free_memory: 13557760 bytes, counters_synced: 93, counters:
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}=old_value+1=1 current value=nil err=nil,
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00100.0"}=old_value+1=1 current value=1 err=nil,...
2025/02/25 16:23:43 ... prometheus.lua:937: metric_data(): Going to export 7686 shared metrics, ..., request: "GET /metrics HTTP/1.1", ...
2025/02/25 16:23:43 ... prometheus.lua:1027: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, ..., request: "GET /metrics HTTP/1.1", ...
2025/02/25 16:23:43 ... prometheus.lua:1008: metric_data(): Expected to have exported 7686 shared metrics, ..., request: "GET /metrics HTTP/1.1", ..."

No changes seem to occur on the shared dict size.

The code that originated this entry.

local function sync(_, self)
  local err, _
  local ok = true
  local count = 0
  local counter_logs = {}

  local initial_free_memory = self.dict:free_space()
  for k, v in pairs(self.increments) do
    local new_val, err, _ = self.dict:incr(k, v, 0)
    if err then
      ngx.log(ngx.WARN, "error increasing counter in shdict key: ", k, ", err: ", err)
      ok = false
    end
    count = count + 1

    local current_value, current_err
    -- Only log counter names that contain "request_latency_ms_bucket" along with their values
    if string.find(k, "request_latency_ms_bucket", 1, true) then
      local k_str = k or "nil"
      local v_str = v or "nil"
      local new_val_str = new_val or "nil"
      current_value, current_err = self.dict:get(k)
      current_value = current_value or "nil"
      current_err = current_err or "nil"
      table.insert(counter_logs, k_str .. "=old_value+" .. v_str .. "=" .. new_val_str .." current value=" .. current_value .. " err=" .. current_err)
    end
  end

  if count > 0 then
    local worker_id = self.id
    local available_memory = self.dict:free_space()
    if #counter_logs > 0 then
      ngx.log(ngx.INFO, "Worker: ", worker_id, ", initial_free_memory: ", initial_free_memory, ", final_free_memory: ", available_memory, " bytes, counters_synced: ", count, ", counters: ", table.concat(counter_logs, ", "))
    else
      ngx.log(ngx.INFO, "Worker: ", worker_id, ", initial_free_memory: ", initial_free_memory, ", final_free_memory: ", available_memory, " bytes, counters_synced: ", count)
    end
  end

  clear_tab(self.increments)
  return ok
end

Are you still using that memory size config in current test?

yes they remain the same.

Feb 26 '25 11:02 jmadureira

That really confused me, As I saw from your debugging results:

dict.incr succeed and returns the value it stored, which is not nil.
dict.get returns nil even the same key has just successfully increased, without err message.

We could suspect if there are LRU eviction for the problematical key, but the key has just been updated and the dict:free_space shows that there are still enough capacity to avoid key eviction.

We could suspect if there is race condition happen but actually between the incr and get, there is no yield or sleep call, since lua/openresty doesn't support preemptive executions of light-weight threads, there should not be thread(coroutine)-level execution switch. And for the process level, we only have incr operation for the histogram metrics, no del or reset, which means there should not be other workers removing that key or setting it to nil between the incr and get operation in this worker. No idea of what happen so far, and I also tried multiple times in my environment and could not re-produce this.

Mar 13 '25 01:03 ProBrian

@ProBrian yes it is a weird behaviour but consistent on our environments. When I deployed to another environmnet I started seeing duplicate metrics when calling the /metrics endpoint.

kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="50"} 47
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="100"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="250"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="400"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="700"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="1000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="2000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="5000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="+Inf"} 58

It's the same 80ms bucket.

And it still puzzles me how can a shared dict hold duplicate keys. To make matters a little bit worse prometheus started complaining about duplicate metrics

Mar 13 '25 15:03 jmadureira

Error in getting metrics & prometheus plugin after bumping to the 3.7.1 release

Is there an existing issue for this?

Kong version ($ kong version)

Current Behavior

Expected Behavior

Steps To Reproduce

Anything else?

Kong version (`$ kong version`)