graphite-api
graphite-api copied to clipboard
graphite-api not returning data with multiple retentions
Hello,
I'm really scratching my head here. We've been running a grafana/graphite-api/carbon/whisper stack for a while now and it's working generally ok. However, I've noticed that if I drill into data in grafana, once I get to a certain level of detail, the chart is blank.
Here is some config. Our storage schema looks like this, store on a 10 sec interval for 7 days, then 1 minute for 2 years.
[Web_Prod] priority = 90 pattern = ^Production..web..WebServer.* retentions = 10s:7d,1m:2y
I can verify this in the whisper files themselves, like this: -
/usr/local/src/whisper/bin/whisper-dump.py /opt/graphite/storage/whisper/Production/Live/web/web2-vm/WebServer/Customer/HPS.wsp | less
Meta data:RETURN) aggregation method: average max retention: 63072000 xFilesFactor: 0
Archive 0 info: offset: 40 seconds per point: 10 points: 60480 retention: 604800 size: 725760
Archive 1 info: offset: 725800 seconds per point: 60 points: 1051200 retention: 63072000 size: 12614400
I've noticed the problem only happens, when querying data older than 7 days i..e after it's been averaged to a 60 second interval. If I pick a time period older than 7 days, across a three minute interval, and look directly inside the whisper file, it all looks good: -
/usr/local/src/whisper/bin/whisper-fetch.py --from 1454230700 --until 1454230880 /opt/graphite/storage/whisper/Production/Live/web/web2-vm/WebServer/Customer/HPS.wsp
1454230740 8.000000 1454230800 8.700000 1454230860 8.233333
However, if I query through graphite-api, it returns a 10 second interval (the wrong retention period, because I'm querying older than 7 days), and all items (even the ones that match the timestamps above) are null.
http://www.dashboard.com/render?target=Production.Live.web.web2-vm.WebServer.Customer.HPS&from=1454230700&until=1454230880&format=json&maxDataPoints=1000
[{"target": "Production.Live.web.571854-web2-vm.WebServer.Customer.HPS", "datapoints": [[null, 1454230710], [null, 1454230720], [null, 1454230730], [null, 1454230740], [null, 1454230750], [null, 1454230760], [null, 1454230770], [null, 1454230780], [null, 1454230790], [null, 1454230800], [null, 1454230810], [null, 1454230820], [null, 1454230830], [null, 1454230840], [null, 1454230850], [null, 1454230860], [null, 1454230870], [null, 1454230880]]}]
If I go for a wider time span, I start to get data back, but some are null and some are populated. What am I doing wrong?!
Thanks, Glen.
I can confirm the issue. While whisper itself returns data as expected, according to configured retentions, graphite-api only correctly works within the interval of first retention. Besides "zooming" described above it also seems to affect functions like timeShift().
Yes likewise, I've seen other issues now, and I think this is a general bug with graphite-api. I'll see if there is a way of raising a bug with the project.
My team have finally found the cause of this and fixed in the source so you can zoom in on old data, it was a bug in one copy of the whisper code we have: -
/usr/share/python/graphite/lib/python2.7/site-packages/graphite_api/_vendor/whisper.py
The call to read the data from the file had:
diff = untilTime - fromTime for archive in header['archives']: if archive['retention'] >= diff: break
this should be
diff = now - fromTime for archive in header['archives']: if archive['retention'] >= diff: break
the other copies of whisper.py on the server are all OK. Interestingly the incorrect one is a later version, the bug seems to have been introduced as a ‘fix’ here https://github.com/graphite-project/whisper/commit/ccd0c89204f2266fa2fc20bad7e49739568086fa , but with no explanation as to why the change was made.
If anyone could shed any light that would be cool!
Here's a summary of the attempted "fix": https://github.com/graphite-project/whisper/pull/139
I ported it indeed, then reverted it but there has been no release since the revert.
I have juste pushed 1.1.3 which should fix the regression. Let me know how it works for you.
@brutasse Could you also update your docker image as well?