carbonapi icon indicating copy to clipboard operation
carbonapi copied to clipboard

[BUG] asPercent(#A,#B) divideSeries(#A, #B) compute badly when #B is based on sumSeries

Open sylvain-beugin opened this issue 3 years ago • 5 comments

Version

Tested with carbonapi carbonapi-0.12.5-1 and latest carbonapi-0.14.1-1

Description

We try to display Percent of memory used on a server on Grafana

So we create 3 querries on Grafana:

#A memory used: myserver.collectd.memory.memory-used

#B Total memory sumSeries(myserver.collectd.memory.*)

#C Percent or ratio asPercent(#A,#B) divideSeries(#A, #B)

When we display the 3 series (in fact ask #A before #C) => it works

When we display/ask only #C (in fact DO NOT ask #A before #C )
=> the result is false because when carbon api compute the Sum #B the metric "memory-used" is not included in the sum. => in fact carbon api compute asPercent(#A,#B-#A)

Logs

When it doesn't work (only display/ask #C) : 2020-10-01T17:33:02.217+0200 INFO access request served {"data": {"handler":"render","carbonapi_uuid":"06091928-92b0-494a-8963-8317d12e891c","url":"/render","peer_ip":"192.168.210.1","host":"localhost","format":"json","use_cache":true,"targets":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"cache_timeout":60,"metrics":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"runtime":0.039076725,"http_code":200,"carbonzipper_response_size_bytes":4555,"carbonapi_response_size_bytes":17171,"from":1601519455,"until":1601551985,"max_data_points":1768,"from_raw":"1601519455","until_raw":"1601551985","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Dashboard-Id":"265","X-Grafana-Org-Id":"1","X-Panel-Id":"2"}}}

When it works (display all ) : 2020-10-01T18:25:19.979+0200 INFO access request served {"data": {"handler":"render","carbonapi_uuid":"15f01432-463c-4ab5-9811-7c6f54be6c94","url":"/render","peer_ip":"192.168.210.1","host":"localhost","format":"json","use_cache":true,"targets":["me1.appli.inme11ldh01noi.collectd.memory.memory-used","asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"cache_timeout":60,"metrics":["me1.appli.inme11ldh01noi.collectd.memory.memory-used","asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"runtime":0.030990212,"http_code":200,"carbonzipper_response_size_bytes":9045,"carbonapi_response_size_bytes":30351,"from":1601519455,"until":1601551985,"max_data_points":1768,"from_raw":"1601519455","until_raw":"1601551985","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Dashboard-Id":"265","X-Grafana-Org-Id":"1","X-Panel-Id":"2"}}}

CarbonAPI Configuration:


# Controls headers that would be passed to the backend
headersToPass:
  - "X-Dashboard-Id"
  - "X-Grafana-Org-Id"
  - "X-Panel-Id"
headersToLog:
  - "X-Dashboard-Id"
  - "X-Grafana-Org-Id"
  - "X-Panel-Id"
# Max concurrent requests to CarbonZipper
concurency: 1000
cache:
   # Type of caching. Valid: "mem", "memcache", "null"
   type: "memcache"
   # Cache limit in megabytes
   size_mb: 512
   # Default cache timeout value. Identical to DEFAULT_CACHE_DURATION in graphite-web.
   defaultTimeoutSec: 60
   # Only used by memcache type of cache. List of memcache servers.
   memcachedServers:
      - "front1:2181"
      - "front2:2181"

upstreams:
    # Number of 100ms buckets to track request distribution in. Used to build
    # 'carbon.zipper.hostname.requests_in_0ms_to_100ms' metric and friends.
    # Requests beyond the last bucket are logged as slow (default of 10 implies
    # "slow" is >1 second).
    # The last bucket is _not_ called 'requests_in_Xms_to_inf' on purpose, so
    # we can change our minds about how many buckets we want to have and have
    # their names remain consistent.
    buckets: 10

    # Enable compatibility with graphite-web 0.9
    # This will affect graphite-web 1.0+ with multiple cluster_servers
    # Default: disabled
    graphite09compat: false

    timeouts:
        # Maximum backend request time for find requests.
        find: "10s"
        # Maximum backend request time for render requests. This is total one and doesn't take into account in-flight requests
        render: "60s"
        # Timeout to connect to the server
        connect: "500ms"

    # Number of concurrent requests to any given backend - default is no limit.
    # If set, you likely want >= MaxIdleConnsPerHost
    concurrencyLimit: 0

    # Configures how often keep alive packets will be sent out
    keepAliveInterval: "30s"

    # Control http.MaxIdleConnsPerHost. Large values can lead to more idle
    # connections on the backend servers which may bump into limits; tune with care.
    maxIdleConnsPerHost: 1000


    backendsv2:
        backends:
          -
            groupName: "site1-render"
            # supported:
            #    carbonapi_v2_pb - carbonapi 0.11 or earlier version of protocol.
            #    carbonapi_v3_pb - new protocol, http interface (native)
            #    carbonapi_v3_grpc - new protocol, gRPC interface (native)
            #    protobuf, pb, pb3 - same as carbonapi_v2_pb
            #    msgpack - protocol used by graphite-web 1.1 and metrictank
            #    auto - carbonapi will do it's best to guess if it's carbonapi_v3_pb or carbonapi_v2_pb
            #
            #  non-native protocols will be internally converted to new protocol, which will increase memory consumption
            protocol: "carbonapi_v2_pb"
            # supported:
            #    "broadcast" - send request to all backends in group and merge responses. This was default behavior for carbonapi 0.11 or earlier
            #    "roundrobin" - send request to one backend.
            #    "all - same as "broadcast"
            #    "rr" - same as "roundrobin"
            lbMethod: "broadcast"
            # amount of retries in case of unsuccessful request
            maxTries: 1
            # amount of metrics per fetch request. Default: 0 - unlimited. If not specified, global will be used
            maxBatchSize: 10
          servers:
                - "http://back1:8090"
                - "http://back2:8090"

    carbonsearchv2:
        # carbonsearch prefix to reserve/register
        prefix: "*"

        # Carbonsearch instances. Follows the same syntax as backendsv2
        backends:
          -
            groupName: "site1-search"
            protocol: "carbonapi_v2_pb"
            lbMethod: "broadcast"
            servers:
                - "http://back1:8090"
                - "http://back2:8090"


functionsConfig:
    graphiteWeb: /etc/carbonapi/graphiteWeb.yaml



Simplified query (if applicable)

DO NOT WORK: asPercent(myserver.collectd.memory.memory-used,sumSeries(myserver.collectd.memory.*))

WORK : ["myserver.collectd.memory.memory-used","asPercent(myserver.collectd.memory.memory-used,sumSeries(myserver.collectd.memory.*))"]

Backend metric retention and aggregation schemas

No cross retention period

Backend response (if possible)

DO NOT WORK:

2020-10-01T18:49:04.018+0200    DEBUG   zipper  got some fetch responses        {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 1}
2020-10-01T18:49:04.040+0200    DEBUG   zipper  got some fetch responses        {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-buffered", "me1.appli.inme11ldh01noi.collectd.memory.memory-cached", "me1.appli.inme11ldh01noi.collectd.memory.memory-free", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_recl", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_unrecl", "me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 6}
2020-10-01T18:49:04.040+0200    DEBUG   zipper  got some fetch responses        {"type": "broadcastGroup", "groupName": "root", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used", "me1.appli.inme11ldh01noi.collectd.memory.*"], "backends_count": 1, "response_count": 1, "have_errors": false, "errors": null, "response_count": 6}
2020-10-01T18:49:04.041+0200    INFO    access  request served  {"data": {"handler":"render","carbonapi_uuid":"4b75db72-0abb-4bad-a89c-615907912691","url":"/render","peer_ip":"192.168.210.1","host":"localhost","format":"json","use_cache":true,"targets":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"cache_timeout":60,"metrics":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"runtime":0.028383404,"http_code":200,"carbonzipper_response_size_bytes":4555,"carbonapi_response_size_bytes":17171,"from":1601519455,"until":1601551985,"max_data_points":1768,"from_raw":"1601519455","until_raw":"1601551985","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Dashboard-Id":"265","X-Grafana-Org-Id":"1","X-Panel-Id":"2"}}}

WORK:

2020-10-01T18:47:14.546+0200    DEBUG   zipper  got some fetch responses        {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 1}
2020-10-01T18:47:14.546+0200    DEBUG   zipper  got some fetch responses        {"type": "broadcastGroup", "groupName": "root", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 1, "response_count": 1, "have_errors": false, "errors": null, "response_count": 1}
2020-10-01T18:47:14.573+0200    DEBUG   zipper  got some fetch responses        {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-buffered", "me1.appli.inme11ldh01noi.collectd.memory.memory-cached", "me1.appli.inme11ldh01noi.collectd.memory.memory-free", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_recl", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_unrecl", "me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 6}
2020-10-01T18:47:14.573+0200    DEBUG   zipper  got some fetch responses        {"type": "broadcastGroup", "groupName": "root", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.*"], "backends_count": 1, "response_count": 1, "have_errors": false, "errors": null, "response_count": 6}
2020-10-01T18:47:14.574+0200    INFO    access  request served  {"data": {"handler":"render","carbonapi_uuid":"cbcd12dd-2b61-473f-97db-2af4d66067a1","url":"/render","peer_ip":"192.168.210.1","host":"localhost","format":"json","use_cache":true,"targets":["me1.appli.inme11ldh01noi.collectd.memory.memory-used","asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"cache_timeout":60,"metrics":["me1.appli.inme11ldh01noi.collectd.memory.memory-used","asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"],"runtime":0.032647471,"http_code":200,"carbonzipper_response_size_bytes":9045,"carbonapi_response_size_bytes":30351,"from":1601519455,"until":1601551985,"max_data_points":1768,"from_raw":"1601519455","until_raw":"1601551985","uri":"/render","from_cache":false,"used_backend_cache":false,"request_headers":{"X-Dashboard-Id":"265","X-Grafana-Org-Id":"1","X-Panel-Id":"2"}}}

sylvain-beugin avatar Oct 01 '20 17:10 sylvain-beugin

A similar issue : https://github.com/go-graphite/carbonapi/issues/487#issuecomment-674523340

sylvain-beugin avatar Oct 01 '20 17:10 sylvain-beugin

Could you check, please, if #529 fixes it?

update Actually, after careful reading, I've recognized that I don't see an issue here.

If you'd take a look, here is a target from "problematic" request: "targets":["asPercent(me1.appli.inme11ldh01noi.collectd.memory.memory-used,sumSeries(me1.appli.inme11ldh01noi.collectd.memory.*))"]

The me1.appli.inme11ldh01noi.collectd.memory.* is clearly should include everything. I'm not sure, what's wrong there. Could you, maybe, do a few tests with cache disabled?

update 2

2020-10-01T18:49:04.018+0200    DEBUG   zipper  got some fetch responses        {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 1}
2020-10-01T18:49:04.040+0200    DEBUG   zipper  got some fetch responses        {"type": "broadcastGroup", "groupName": "site1-render", "type": "fetch", "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-buffered", "me1.appli.inme11ldh01noi.collectd.memory.memory-cached", "me1.appli.inme11ldh01noi.collectd.memory.memory-free", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_recl", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_unrecl", "me1.appli.inme11ldh01noi.collectd.memory.memory-used"], "backends_count": 2, "response_count": 2, "have_errors": false, "errors": null, "response_count": 6}

-> request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-used"] "request": ["me1.appli.inme11ldh01noi.collectd.memory.memory-buffered", "me1.appli.inme11ldh01noi.collectd.memory.memory-cached", "me1.appli.inme11ldh01noi.collectd.memory.memory-free", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_recl", "me1.appli.inme11ldh01noi.collectd.memory.memory-slab_unrecl", "me1.appli.inme11ldh01noi.collectd.memory.memory-used"]

me1.appli.inme11ldh01noi.collectd.memory.memory-used is queried in both requests from the log

Felixoid avatar Oct 13 '20 16:10 Felixoid

Hey @sylvain-beugin, any following up here?

Felixoid avatar Oct 30 '20 07:10 Felixoid

Hi,

Yes, just after the test i sent you, i have also disable cache and the bug remains. I have the same refexion as you , logs are ok, i have 6 metrics ... so perhaps the value disapear in cache system...

Sorry i I did not checks #529 yet , missing time...

sylvain-beugin avatar Nov 14 '20 23:11 sylvain-beugin

FWIW, I'm running 0.16-patch2 and came here to submit a similar bug report.

I'm seeing something very similar where carbonapi reports 3 or 4 times the value graphite-web reports when using a pattern like:

#A = sumSeries(...)
#B = timeShift(#A, '7d')
#C = divideSeries(#A,#B)

If I flip my datasource to "Graphite Web" and refresh the query, I see something like 0.97, flip over to using carbonapi as the datasource and the same point returns 3.88.

reyjrar avatar Jun 22 '23 16:06 reyjrar