Error in getting metrics & prometheus plugin after bumping to the 3.7.1 release
Is there an existing issue for this?
- [x] I have searched the existing issues
Kong version ($ kong version)
3.7.1 / 3.9.0
Current Behavior
I am having problems with metrics & prometheus plugin after bumping to the 3.7.1 release. (I already bumped Kong until 3.9.0 and the issue still persists)
I have the following entry in my logs:
[lua] prometheus.lua:1020: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, client: 10.145.40.1, server: kong_status, request: "GET /metrics HTTP/1.1", host: "10.145.12.54:8100"
Interesting facts:
- it is always the same service and that route has that error. In my case it is always the same two routes for the same bucket.
- When we revert back to 3.6.1 and the problem goes away
- After a few months, we bumped Kong to version 3.9.0 and the problem started happening again after a couple of hours for the same routes + buckets.
- Goes away with a pod rotation but comes back after a while.
I already tried:
- nginx_http_lua_shared_dict: 'prometheus_metrics 15m' Memory now stands at +-20%
One pod contains:
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="50"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="100"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="250"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="400"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="700"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="1000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="2000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="5000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="+Inf"} 2
While another is missing the le "80"
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="50"} 1
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="100"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="250"} 2
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="400"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="700"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="1000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="2000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="5000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000"} 3
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="+Inf"} 3
We are running our Kong in AWS EKS, upgraded from 3.6.1
Expected Behavior
The bucket should not disappear, but if it does for any reason I would expect Kong to be able to recover from an inconsistent state. (maybe metric reset?)
Steps To Reproduce
No response
Anything else?
No response
@rodolfobrunner Does this issue happen while using the same deployment as https://github.com/Kong/kong/issues/14144? I'm trying to reproduce it.
@rodolfobrunner Does this issue happen while using the same deployment as #14144? I'm trying to reproduce it.
Hey @ProBrian, I am part of the same team as @rodolfobrunner. Yes, it's the same deployment.
Hello @ProBrian some additional info on what we're seeing. We added at some point some debug instructions to figure out what was being stored. Something like:
-- Adapted from the prometheus metric_data function
local function collect()
ngx.header["Content-Type"] = "text/plain; charset=UTF-8"
ngx.header["Kong-NodeId"] = node_id
local prometheus = exporter.get_prometheus()
local write_fn = ngx.print
local keys = prometheus.dict:get_keys(0) <--- this is the ngx.shared["prometheus_metrics"]
local count = #keys
table_sort(keys)
local output = buffer.new(DATA_BUFFER_SIZE_HINT)
local output_count = 0
local function buffered_print(fmt, ...)
if fmt then
output_count = output_count + 1
output:putf(fmt, ...)
end
if output_count >= 100 or not fmt then
write_fn(output:get()) -- consume the whole buffer
output_count = 0
end
end
for i = 1, count do
local key = keys[i]
local value = prometheus.dict[key]
buffered_print("%s: %s\n", key, value)
end
buffered_print(nil)
output:free()
end
... which outputs (when the error occurs):
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <----------
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00050.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00100.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00250.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00400.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00700.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="01000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="02000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="05000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000.0"} null
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="Inf"} null
How can a dictionary support duplicate keys? Even if it's a shared dictionary?
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <---------- request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} null <---------- request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00050.0"} null request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00100.0"} null
That's weird, from the print logs even the table_sort works abnormally, "00080.0" appears before "00050.0".
Hello @ProBrian some more information. Added more information to the introspection and got this.
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"} - local_metrics: nil shared_dict: nil <---- the important dict
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00100.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00250.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00400.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00700.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="01000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="02000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="05000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000.0"} - local_metrics: nil shared_dict: 1
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000.0"} - local_metrics: nil shared_dict: 1
Which explains the error described earlier (the value is indeed nil).
[lua] prometheus.lua:1020: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, client: 10.145.40.1, server: kong_status, request: "GET /metrics HTTP/1.1", host: "10.145.12.54:8100"
The disturbing part here is that the code that dumps the counters to the nginx shared dict:
local function sync(_, self)
local err, _
local ok = true
for k, v in pairs(self.increments) do
_, err, _ = self.dict:incr(k, v, 0)
if err then
ngx.log(ngx.WARN, "error increasing counter in shdict key: ", k, ", err: ", err)
ok = false
end
end
clear_tab(self.increments)
return ok
end
the self.dict:incr(k, v, 0) according to the documentation:
Increments the (numerical) value for key in the shm-based dictionary ngx.shared.DICT by the step value value. Returns the new resulting number if the operation is successfully completed or nil and an error message otherwise.
When the key does not exist or has already expired in the shared dictionary,
if the init argument is not specified or takes the value nil, this method will return nil and the error string "not found", or if the init argument takes a number value, this method will create a new key with the value init + value.
Under which scenarios will this operation write nil?
@jmadureira For the ngx.shared.DICT.incr(k,v), v will only take a number, so there should be no way incr write an nil, I started to doubt if there is issue in this API.
By the way, for your debug function, seems the dict value get is not correct, could you fix it and also check if there are err message got when getting a nil value from shared dict?
for i = 1, count do
local key = keys[i]
- local value = prometheus.dict[key]
+ local value, err = prometheus.dict:get(key)
-- if `err` is not nil, try to print out the err here
buffered_print("%s: %s\n", key, value)
end
@jmadureira Could you do more debug to clarify that:
- are there warning logs to indicate write error when doing
self.dict:incr(k, v, 0)operation. - could you use
ngx.shared.DICT.free_spaceto check if the shared dict still has free memory for the metrics. What I'm wondering is that is there a opportunity that some key(like thele=80slot) has been evicted due to memory insufficient and LRU logic. That'skeys = get_keys() --> keys[i] evicted due to LRU --> get(keys[i]) got an nil. so we can try another get_keys right after the get value iteration like:
local keys = prometheus.dict:get_keys(0) <--- this is the ngx.shared["prometheus_metrics"]
...
--- the loop of get value
for i = 1, count do
local key = keys[i]
local value = prometheus.dict:get(key)
buffered_print("%s: %s\n", key, value)
end
--- get keys again to verify if the keys are identical, this helps us figure out if there are keys evicted due to the loop of get value.
keys = prometheus.dict:get_keys(0)
@ProBrian you are probably on to something. I changed the sync logic a little bit
local function sync(_, self)
local err, _
local ok = true
local count = 0
local counter_logs = {}
for k, v in pairs(self.increments) do
new_val, err, _ = self.dict:incr(k, v, 0)
if err then
ngx.log(ngx.WARN, "error increasing counter in shdict key: ", k, ", err: ", err)
ok = false
end
count = count + 1
-- Only log counter names that contain "request_latency_ms_bucket" along with their values
if string.find(k, "request_latency_ms_bucket", 1, true) then
table.insert(counter_logs, k .. "=old_value+" .. v .. "=" .. new_val .." current value=" .. self.dict:get(k)) <---- also print the current value on the shared dictionary
end
end
if count > 0 then
if #counter_logs > 0 then
ngx.log(ngx.INFO, "Synced ", count, " counters from worker ", self.id, ": ", table.concat(counter_logs, ", "))
else
ngx.log(ngx.INFO, "Synced ", count, " counters from worker ", self.id)
end
end
clear_tab(self.increments)
return ok
end
The error no longer shows up, most likely because since the value gets read immediately it won't be affected by the LRU logic.
On the other hand if the le=80 slot had been evicted due to memory constraints, the prometheus exporter wouldn't pick it up and no error would be logged. The metric_data function on the prometheus.lua loops through all keys on the shared dict.
The error no longer shows up, most likely because since the value gets read immediately it won't be affected by the LRU logic.
I think incr and get both make a key fresh, if we already incr a specific key "le=xx" in current sync, it should not be evicted(so get(le=xx) immediately again is not the reason the key not evicted). I feel like It's more possible to be a concurrent + timing issue like:
coroutine1(metric_data): get_keys got [le=80, le=100,...] -- looping keys : get(le=**) ok, ..., get(le=100) ok, get(le=80) nil ----
coroutine2(sync): ----------------------------- incr value may evict "le=80" which is old ---------
So when eviction of key le=80 just happen between get_keys of metric_data and get(le=80) of sync, you got a nil. When you add something in sync, that might just changed the timing, e.g. eviction now happens before get_keys, so you don't even get the "le=80" key in keys, and therefore in the loop of keys no get(le=80) invoked and no error happened.
That's why I suggested adding another get_keys right after dict looping in metric_data and see if there are keys missing comparing to the get_keys before dict looping, that would help us confirm if this is the problem.
@ProBrian another round of debugging produced the following messages:
2025/02/20 13:56:56 ... counter.lua:65: Synced 46 counters from worker nil: request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}=old_value+1=1 current value=nil, ...
2025/02/20 13:57:04 ... prometheus.lua:937: metric_data(): Going to export 5559 shared metrics, ... request: "GET /metrics ...
2025/02/20 13:57:04 ... prometheus.lua:1027: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, ... request: "GET /metrics ...
2025/02/20 13:57:04 ... prometheus.lua:1008: metric_data(): Expected to have exported 5559 shared metrics, ... request: "GET /metrics ...
The current_value logged on the first line is supposed to be the current value on the shared cache right after changing it so I wouldn't expect it to be nil. Instead it should be 1 since that is the value returned by self.dict:incr(k, v, 0).
The rest is your suggestion to check the number of keys before and after the dict_loop on the metric_data function which remains the same.
Since this custom log message I'm including the code as well.
On the counter.lua if have:
local function sync(_, self)
local err, _
local ok = true
local count = 0
local counter_logs = {}
for k, v in pairs(self.increments) do
new_val, err, _ = self.dict:incr(k, v, 0)
if err then
ngx.log(ngx.WARN, "error increasing counter in shdict key: ", k, ", err: ", err)
ok = false
end
count = count + 1
-- Only log counter names that contain "request_latency_ms_bucket" along with their values
if string.find(k, "request_latency_ms_bucket", 1, true) then
local k_str = k or "nil"
local v_str = v or "nil"
local new_val_str = new_val or "nil"
local current_value = self.dict:get(k) or "nil"
table.insert(counter_logs, k_str .. "=old_value+" .. v_str .. "=" .. new_val_str .." current value=" .. current_value)
end
end
if count > 0 then
if #counter_logs > 0 then
ngx.log(ngx.INFO, "Synced ", count, " counters from worker ", self.id, ": ", table.concat(counter_logs, ", "))
else
ngx.log(ngx.INFO, "Synced ", count, " counters from worker ", self.id)
end
end
clear_tab(self.increments)
return ok
end
On the prometheus.lua I made the following changes:
function Prometheus:metric_data(write_fn, local_only)
-- ...
local count = #keys
ngx_log(ngx.INFO, "Going to export " .. count .. " shared metrics")
for k, v in pairs(self.local_metrics) do
keys[count+1] = k
count = count + 1
end
-- ...
for i = 1, count do
yield()
-- ...
end
keys = self.dict:get_keys(0)
local count = #keys
ngx_log(ngx.INFO, "Expected to have exported " .. count .. " shared metrics")
buffered_print(nil)
output:free()
end
2025/02/20 13:56:56 ... counter.lua:65: Synced 46 counters from worker nil: request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}=old_value+1=1 current value=nil, ...
Emm... That makes me feel suspicious about the ngx.shared.DICT.incr API. I think if that API works well, the get should not get a nil right after the incr. Could you also record the err of get if exists? Due to the document of openresty:
Retrieving the value in the dictionary ngx.shared.DICT for the key key. If the key does not exist or has expired, then nil will be returned. In case of errors, nil and a string describing the error will be returned.
So for the nil get, we could confirm whether newly incr key is evicted again, or there's something else wrong happen in get function. the change could be like:
local new_val_str = new_val or "nil"
local current_value, err = self.dict:get(k)
table.insert(counter_logs, k_str .. "=old_value+" .. v_str .. "=" .. new_val_str .." current value=" .. tostring(current_value) .. " err: " .. err)
By the way, how about the memory size you set for the shm? As @rodolfobrunner mentioned
I already tried:
nginx_http_lua_shared_dict: 'prometheus_metrics 15m' Memory now stands at +-20%
Are you still using that memory size config in current test?
@ProBrian another round of tests
2025/02/25 16:23:41 ... counter.lua:71: Worker: nil, initial_free_memory: 13557760, final_free_memory: 13557760 bytes, counters_synced: 93, counters:
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}=old_value+1=1 current value=nil err=nil,
request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00100.0"}=old_value+1=1 current value=1 err=nil,...
2025/02/25 16:23:43 ... prometheus.lua:937: metric_data(): Going to export 7686 shared metrics, ..., request: "GET /metrics HTTP/1.1", ...
2025/02/25 16:23:43 ... prometheus.lua:1027: log_error(): Error getting 'request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="00080.0"}': nil, ..., request: "GET /metrics HTTP/1.1", ...
2025/02/25 16:23:43 ... prometheus.lua:1008: metric_data(): Expected to have exported 7686 shared metrics, ..., request: "GET /metrics HTTP/1.1", ..."
No changes seem to occur on the shared dict size.
The code that originated this entry.
local function sync(_, self)
local err, _
local ok = true
local count = 0
local counter_logs = {}
local initial_free_memory = self.dict:free_space()
for k, v in pairs(self.increments) do
local new_val, err, _ = self.dict:incr(k, v, 0)
if err then
ngx.log(ngx.WARN, "error increasing counter in shdict key: ", k, ", err: ", err)
ok = false
end
count = count + 1
local current_value, current_err
-- Only log counter names that contain "request_latency_ms_bucket" along with their values
if string.find(k, "request_latency_ms_bucket", 1, true) then
local k_str = k or "nil"
local v_str = v or "nil"
local new_val_str = new_val or "nil"
current_value, current_err = self.dict:get(k)
current_value = current_value or "nil"
current_err = current_err or "nil"
table.insert(counter_logs, k_str .. "=old_value+" .. v_str .. "=" .. new_val_str .." current value=" .. current_value .. " err=" .. current_err)
end
end
if count > 0 then
local worker_id = self.id
local available_memory = self.dict:free_space()
if #counter_logs > 0 then
ngx.log(ngx.INFO, "Worker: ", worker_id, ", initial_free_memory: ", initial_free_memory, ", final_free_memory: ", available_memory, " bytes, counters_synced: ", count, ", counters: ", table.concat(counter_logs, ", "))
else
ngx.log(ngx.INFO, "Worker: ", worker_id, ", initial_free_memory: ", initial_free_memory, ", final_free_memory: ", available_memory, " bytes, counters_synced: ", count)
end
end
clear_tab(self.increments)
return ok
end
Are you still using that memory size config in current test?
yes they remain the same.
That really confused me, As I saw from your debugging results:
dict.incrsucceed and returns the value it stored, which is notnil.dict.getreturnsnileven the same key has just successfully increased, withouterrmessage.
We could suspect if there are LRU eviction for the problematical key, but the key has just been updated and the dict:free_space shows that there are still enough capacity to avoid key eviction.
We could suspect if there is race condition happen but actually between the incr and get, there is no yield or sleep call, since lua/openresty doesn't support preemptive executions of light-weight threads, there should not be thread(coroutine)-level execution switch. And for the process level, we only have incr operation for the histogram metrics, no del or reset, which means there should not be other workers removing that key or setting it to nil between the incr and get operation in this worker.
No idea of what happen so far, and I also tried multiple times in my environment and could not re-produce this.
@ProBrian yes it is a weird behaviour but consistent on our environments. When I deployed to another environmnet I started seeing duplicate metrics when calling the /metrics endpoint.
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="50"} 47
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="80"} 53
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="100"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="250"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="400"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="700"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="1000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="2000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="5000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="10000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="30000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="60000"} 58
kong_request_latency_ms_bucket{service="customer-support",route="customer-support_getcards",workspace="default",le="+Inf"} 58
It's the same 80ms bucket.
And it still puzzles me how can a shared dict hold duplicate keys. To make matters a little bit worse prometheus started complaining about duplicate metrics