carbonapi
carbonapi copied to clipboard
[BUG] asPercent(#A,#B) divideSeries(#A, #B) compute badly when #B is based on sumSeries
Version
Tested with carbonapi carbonapi-0.12.5-1 and latest carbonapi-0.14.1-1
Description
We try to display Percent of memory used on a server on Grafana
So we create 3 querries on Grafana:
#A memory used: myserver.collectd.memory.memory-used
#B Total memory sumSeries(myserver.collectd.memory.*)
#C Percent or ratio asPercent(#A,#B) divideSeries(#A, #B)
When we display the 3 series (in fact ask #A before #C) => it works
When we display/ask only #C (in fact DO NOT ask #A before #C )
=> the result is false because when carbon api compute the Sum #B the metric "memory-used" is not included in the sum.
=> in fact carbon api compute asPercent(#A,#B-#A)
Logs
When it doesn't work (only display/ask #C) :
2020-10-01T17:33:02.217+0200 INFO access request served {"data": {"handler":"render","carbonapi_uuid":"06091928-92b0-494a-8963-8317d12e891c","url":"/render","peer_ip":"192.168.210.1","host":"localhost","format":"json","use_cache":true,"targets":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"cache_timeout":60,"metrics":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"runtime":0.039076725,"http_code":200,"carbonzipper_response_size_bytes":4555,"carbonapi_response_size_bytes":17171,"from":1601519455,"until":1601551985,"max_data_points":1768,"from_raw":"1601519455","until_raw":"1601551985","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Dashboard-Id":"265","X-Grafana-Org-Id":"1","X-Panel-Id":"2"}}}
When it works (display all ) :
2020-10-01T18:25:19.979+0200 INFO access request served {"data": {"handler":"render","carbonapi_uuid":"15f01432-463c-4ab5-9811-7c6f54be6c94","url":"/render","peer_ip":"192.168.210.1","host":"localhost","format":"json","use_cache":true,"targets":["me1.appli.inme11ldh01noi.collectd.memory.memory-used","asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"cache_timeout":60,"metrics":["me1.appli.inme11ldh01noi.collectd.memory.memory-used","asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"runtime":0.030990212,"http_code":200,"carbonzipper_response_size_bytes":9045,"carbonapi_response_size_bytes":30351,"from":1601519455,"until":1601551985,"max_data_points":1768,"from_raw":"1601519455","until_raw":"1601551985","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Dashboard-Id":"265","X-Grafana-Org-Id":"1","X-Panel-Id":"2"}}}
CarbonAPI Configuration:
# Controls headers that would be passed to the backend
headersToPass:
- "X-Dashboard-Id"
- "X-Grafana-Org-Id"
- "X-Panel-Id"
headersToLog:
- "X-Dashboard-Id"
- "X-Grafana-Org-Id"
- "X-Panel-Id"
# Max concurrent requests to CarbonZipper
concurency: 1000
cache:
# Type of caching. Valid: "mem", "memcache", "null"
type: "memcache"
# Cache limit in megabytes
size_mb: 512
# Default cache timeout value. Identical to DEFAULT_CACHE_DURATION in graphite-web.
defaultTimeoutSec: 60
# Only used by memcache type of cache. List of memcache servers.
memcachedServers:
- "front1:2181"
- "front2:2181"
upstreams:
# Number of 100ms buckets to track request distribution in. Used to build
# 'carbon.zipper.hostname.requests_in_0ms_to_100ms' metric and friends.
# Requests beyond the last bucket are logged as slow (default of 10 implies
# "slow" is >1 second).
# The last bucket is _not_ called 'requests_in_Xms_to_inf' on purpose, so
# we can change our minds about how many buckets we want to have and have
# their names remain consistent.
buckets: 10
# Enable compatibility with graphite-web 0.9
# This will affect graphite-web 1.0+ with multiple cluster_servers
# Default: disabled
graphite09compat: false
timeouts:
# Maximum backend request time for find requests.
find: "10s"
# Maximum backend request time for render requests. This is total one and doesn't take into account in-flight requests
render: "60s"
# Timeout to connect to the server
connect: "500ms"
# Number of concurrent requests to any given backend - default is no limit.
# If set, you likely want >= MaxIdleConnsPerHost
concurrencyLimit: 0
# Configures how often keep alive packets will be sent out
keepAliveInterval: "30s"
# Control http.MaxIdleConnsPerHost. Large values can lead to more idle
# connections on the backend servers which may bump into limits; tune with care.
maxIdleConnsPerHost: 1000
backendsv2:
backends:
-
groupName: "site1-render"
# supported:
# carbonapi_v2_pb - carbonapi 0.11 or earlier version of protocol.
# carbonapi_v3_pb - new protocol, http interface (native)
# carbonapi_v3_grpc - new protocol, gRPC interface (native)
# protobuf, pb, pb3 - same as carbonapi_v2_pb
# msgpack - protocol used by graphite-web 1.1 and metrictank
# auto - carbonapi will do it's best to guess if it's carbonapi_v3_pb or carbonapi_v2_pb
#
# non-native protocols will be internally converted to new protocol, which will increase memory consumption
protocol: "carbonapi_v2_pb"
# supported:
# "broadcast" - send request to all backends in group and merge responses. This was default behavior for carbonapi 0.11 or earlier
# "roundrobin" - send request to one backend.
# "all - same as "broadcast"
# "rr" - same as "roundrobin"
lbMethod: "broadcast"
# amount of retries in case of unsuccessful request
maxTries: 1
# amount of metrics per fetch request. Default: 0 - unlimited. If not specified, global will be used
maxBatchSize: 10
servers:
- "http://back1:8090"
- "http://back2:8090"
carbonsearchv2:
# carbonsearch prefix to reserve/register
prefix: "*"
# Carbonsearch instances. Follows the same syntax as backendsv2
backends:
-
groupName: "site1-search"
protocol: "carbonapi_v2_pb"
lbMethod: "broadcast"
servers:
- "http://back1:8090"
- "http://back2:8090"
functionsConfig:
graphiteWeb: /etc/carbonapi/graphiteWeb.yaml
Simplified query (if applicable)
DO NOT WORK: asPercent(myserver.collectd.memory.memory-used,sumSeries(myserver.collectd.memory.*))
WORK : ["myserver.collectd.memory.memory-used","asPercent(myserver.collectd.memory.memory-used,sumSeries(myserver.collectd.memory.*))"]
Backend metric retention and aggregation schemas
No cross retention period
Backend response (if possible)
DO NOT WORK:
2020-10-01T18:49:04.018+0200 DEBUG zipper got some fetch responses {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 1}
2020-10-01T18:49:04.040+0200 DEBUG zipper got some fetch responses {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-buffered", "me1.appli.inme11ldh01noi.collectd.memory.memory-cached", "me1.appli.inme11ldh01noi.collectd.memory.memory-free", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_recl", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_unrecl", "me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 6}
2020-10-01T18:49:04.040+0200 DEBUG zipper got some fetch responses {"type": "broadcastGroup", "groupName": "root", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used", "me1.appli.inme11ldh01noi.collectd.memory.*"], "backends_count": 1, "response_count": 1, "have_errors": false, "errors": null, "response_count": 6}
2020-10-01T18:49:04.041+0200 INFO access request served {"data": {"handler":"render","carbonapi_uuid":"4b75db72-0abb-4bad-a89c-615907912691","url":"/render","peer_ip":"192.168.210.1","host":"localhost","format":"json","use_cache":true,"targets":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"cache_timeout":60,"metrics":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"runtime":0.028383404,"http_code":200,"carbonzipper_response_size_bytes":4555,"carbonapi_response_size_bytes":17171,"from":1601519455,"until":1601551985,"max_data_points":1768,"from_raw":"1601519455","until_raw":"1601551985","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Dashboard-Id":"265","X-Grafana-Org-Id":"1","X-Panel-Id":"2"}}}
WORK:
2020-10-01T18:47:14.546+0200 DEBUG zipper got some fetch responses {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 1}
2020-10-01T18:47:14.546+0200 DEBUG zipper got some fetch responses {"type": "broadcastGroup", "groupName": "root", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 1, "response_count": 1, "have_errors": false, "errors": null, "response_count": 1}
2020-10-01T18:47:14.573+0200 DEBUG zipper got some fetch responses {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-buffered", "me1.appli.inme11ldh01noi.collectd.memory.memory-cached", "me1.appli.inme11ldh01noi.collectd.memory.memory-free", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_recl", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_unrecl", "me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 6}
2020-10-01T18:47:14.573+0200 DEBUG zipper got some fetch responses {"type": "broadcastGroup", "groupName": "root", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.*"], "backends_count": 1, "response_count": 1, "have_errors": false, "errors": null, "response_count": 6}
2020-10-01T18:47:14.574+0200 INFO access request served {"data": {"handler":"render","carbonapi_uuid":"cbcd12dd-2b61-473f-97db-2af4d66067a1","url":"/render","peer_ip":"192.168.210.1","host":"localhost","format":"json","use_cache":true,"targets":["me1.appli.inme11ldh01noi.collectd.memory.memory-used","asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"cache_timeout":60,"metrics":["me1.appli.inme11ldh01noi.collectd.memory.memory-used","asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"runtime":0.032647471,"http_code":200,"carbonzipper_response_size_bytes":9045,"carbonapi_response_size_bytes":30351,"from":1601519455,"until":1601551985,"max_data_points":1768,"from_raw":"1601519455","until_raw":"1601551985","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Dashboard-Id":"265","X-Grafana-Org-Id":"1","X-Panel-Id":"2"}}}
A similar issue : https://github.com/go-graphite/carbonapi/issues/487#issuecomment-674523340
Could you check, please, if #529 fixes it?
update Actually, after careful reading, I've recognized that I don't see an issue here.
If you'd take a look, here is a target from "problematic" request: "targets":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"]
The me1.appli.inme11ldh01noi.collectd.memory.*
is clearly should include everything. I'm not sure, what's wrong there. Could you, maybe, do a few tests with cache disabled?
update 2
2020-10-01T18:49:04.018+0200 DEBUG zipper got some fetch responses {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 1}
2020-10-01T18:49:04.040+0200 DEBUG zipper got some fetch responses {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-buffered", "me1.appli.inme11ldh01noi.collectd.memory.memory-cached", "me1.appli.inme11ldh01noi.collectd.memory.memory-free", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_recl", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_unrecl", "me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 6}
->
request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"]
"request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-buffered", "me1.appli.inme11ldh01noi.collectd.memory.memory-cached", "me1.appli.inme11ldh01noi.collectd.memory.memory-free", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_recl", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_unrecl", "me1.appli.inme11ldh01noi.collectd.memory.memory-used"]
me1.appli.inme11ldh01noi.collectd.memory.memory-used
is queried in both requests from the log
Hey @sylvain-beugin, any following up here?
Hi,
Yes, just after the test i sent you, i have also disable cache and the bug remains. I have the same refexion as you , logs are ok, i have 6 metrics ... so perhaps the value disapear in cache system...
Sorry i I did not checks #529 yet , missing time...
FWIW, I'm running 0.16-patch2 and came here to submit a similar bug report.
I'm seeing something very similar where carbonapi reports 3 or 4 times the value graphite-web reports when using a pattern like:
#A = sumSeries(...)
#B = timeShift(#A, '7d')
#C = divideSeries(#A,#B)
If I flip my datasource to "Graphite Web" and refresh the query, I see something like 0.97
, flip over to using carbonapi as the datasource and the same point returns 3.88
.