go-carbon
go-carbon copied to clipboard
carbonserver: fix a cache hit bug
original issue
I was originally writing issue on carbonapi github Issue
after some debug and code reading, i found out issue is with the carbonserver->render->cache function
the PR provided fix would expand cache key by 2 times, i dont like it, but i think you guys want to keep this key so its human readable?
the preferable way is run md5sum on strings.Join(targetKeys, "&")
before line 318 so the key would be way less in length but it wont be human readable
let me describe the issue here again
when you generate two queries similarly with the same from&until in the same request
-
metrics.FRP[PPP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT
-
metrics.FRP[PP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT
curl would be like this
FROM=1661708057;curl -s "http://carbonapi/render?target=metrics.FRP\[PPP\]-ADE*-IAD\[23\].*_*.impression_log_DEBUG_FIELD_VPS_COUNT&target=metrics.FRP\[PP\]-ADE*-IAD\[23\].*_*.impression_log_DEBUG_FIELD_VPS_COUNT&from=${FROM}&until=-180s&format=json"
Both should return exactly same result (but in our test when cache enabled sometime it wouldnt) and we have carbonapi->go-carbon through pbv3
the problem
so the problem is when you have carbonserver render cache enabled.
it would take the targets->targets map[timeRange][]target
and generate a key based on names&from&until
so in my example, the first query would generate bunch of cache that has key->value which would be used (cache hit) in second query since they should be returning exactly same data (assuming they are not expired)
so when
First query caches the key, it stores the value of the key, which is response
type which contains PathExpression
of first query's PathExpression
then
Second query would hit some of the cache (First query sets), and it gets the response
from cache with First querys PathExpression
,
as a result
when this gets returned to carbonapi, carbonapi would treat it as wrong request target, the data would not be merged to second query's exprssion cause Second query has different PathExpression
than Frist query
i added some debug code to carbonapi and generate this
wrong
Sep 05 14:49:56 carbonapi-server carbonapi[12688]: expr.go line: 73 873
Sep 05 14:49:56 carbonapi-server carbonapi[12688]: expr.go line: 84 key: {metrics.FRP[PPP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414416} value: 873
Sep 05 14:49:56 carbonapi-server carbonapi[12688]: expr.go line: 95 key: {metrics.FRP[PPP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414416} value: 873
Sep 05 14:49:56 carbonapi-server carbonapi[12688]: render_handler.go target: metrics.FRP[PPP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT, count: 873
Sep 05 14:49:57 carbonapi-server carbonapi[12688]: expr.go line: 73 873
Sep 05 14:49:57 carbonapi-server carbonapi[12688]: expr.go line: 84 key: {metrics.FRP[PPP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414416} value: 873
Sep 05 14:49:57 carbonapi-server carbonapi[12688]: expr.go line: 84 key: {metrics.FRP[PP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414416} value: 773
Sep 05 14:49:57 carbonapi-server carbonapi[12688]: expr.go line: 84 key: { 1661708059 1662414416} value: 100
Sep 05 14:49:57 carbonapi-server carbonapi[12688]: expr.go line: 95 key: {metrics.FRP[PP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414416} value: 773
Sep 05 14:49:57 carbonapi-server carbonapi[12688]: render_handler.go target: metrics.FRP[PP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT, count: 773
Sep 05 14:49:57 carbonapi-server carbonapi[12688]: render_handler.go number: 1646
right
Sep 05 14:56:26 carbonapi-server carbonapi[12688]: expr.go line: 73 873
Sep 05 14:56:26 carbonapi-server carbonapi[12688]: expr.go line: 84 key: {metrics.FRP[PPP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414805} value: 873
Sep 05 14:56:26 carbonapi-server carbonapi[12688]: expr.go line: 95 key: {metrics.FRP[PPP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414805} value: 873
Sep 05 14:56:26 carbonapi-server carbonapi[12688]: render_handler.go target: metrics.FRP[PPP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT, count: 873
Sep 05 14:56:27 carbonapi-server carbonapi[12688]: expr.go line: 73 873
Sep 05 14:56:27 carbonapi-server carbonapi[12688]: expr.go line: 84 key: {metrics.FRP[PPP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414805} value: 873
Sep 05 14:56:27 carbonapi-server carbonapi[12688]: expr.go line: 84 key: {metrics.FRP[PP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414805} value: 873
Sep 05 14:56:27 carbonapi-server carbonapi[12688]: expr.go line: 95 key: {metrics.FRP[PP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT 1661708059 1662414805} value: 873
Sep 05 14:56:27 carbonapi-server carbonapi[12688]: render_handler.go target: metrics.FRP[PP]-ADE*-IAD[23].*_*.impression_log_DEBUG_FIELD_VPS_COUNT, count: 873
Sep 05 14:56:27 carbonapi-server carbonapi[12688]: render_handler.go number: 1746
as you can see some of the data gets the key as { 1661708059 1662414416}
because they contian first query's PathExpression
due to cache hit
if you would not care about key being too long, the PR is enough to fix the problem
but if you want shorter key, i can change md5 before key generation so it would just be a md5sum + format
easy work around
an easy work around is turn off the render cache
btw i didnt really test the code, i just write it as my golang memory allow, i can add test case or if we wanna go other direction