naarad
naarad copied to clipboard
Aggregation Incorrect for QPS
If aggr_metrics=none
is specified, the qps count is not correct.
For instance, for the below example: It reports qps = 1 when it should be 4 and 1.
Also it only seems to plot second-level data points such that it only plots the last data points.
data.csv
1433409051000,30
1433409051200,10
1433409051300,30
1433409051400,16
1433409061000,30
naarad.conf
[Latency]
infile=data.csv
sep=,
columns=test
aggr_metrics=none
[GRAPH]
graphing_library=matplotlib
Attached an email I sent earlier:
Hi,
After I explore a little bit, I don’t think the combine of “aggr_metrics=none" and qps reported correctly >could be solved easily. If we want to disable sub-second aggregation, naarad will generate one record >per timestamp into csv file. Then the qps generation logic reads the file and consider each entry as >separate record. In this case this qps is no longer qps any more but query per subsecond which does not >have much meanings.
Either rewrite the qps logic which will take some time or just forget about using qps when you don’t want >to use second aggregation.
-Tao
CSV file is generated in correctly. We should fix the qps issue , otherwise it is not usable. :) I can put it as my next sprint story if nobody picks up ( a good excise for new hires though).
@zhenyun I am still not clear what's incorrect here. if aggr_metrics=none, then qps cannot be calculated (since we are not aggregating anything)
In my case, I did not specify anything. The metric is custom metric (not sar output). QPS (the csv file) is wrong.
On Fri, Aug 28, 2015 at 9:31 AM, Ritesh Maheshwari <[email protected]
wrote:
@zhenyun https://github.com/zhenyun I am still not clear what's incorrect here. if aggr_metrics=none, then qps cannot be calculated (since we are not aggregating anything)
— Reply to this email directly or view it on GitHub https://github.com/linkedin/naarad/issues/337#issuecomment-135825976.
Zen (Zhenyun Zhuang, Performance Team)
Okay, feel free to pick up the investigation/story since you have the full context.
On Fri, Aug 28, 2015 at 9:40 AM, zhenyun [email protected] wrote:
In my case, I did not specify anything. The metric is custom metric (not sar output). QPS (the csv file) is wrong.
On Fri, Aug 28, 2015 at 9:31 AM, Ritesh Maheshwari < [email protected]
wrote:
@zhenyun https://github.com/zhenyun I am still not clear what's incorrect here. if aggr_metrics=none, then qps cannot be calculated (since we are not aggregating anything)
— Reply to this email directly or view it on GitHub https://github.com/linkedin/naarad/issues/337#issuecomment-135825976.
Zen (Zhenyun Zhuang, Performance Team)
— Reply to this email directly or view it on GitHub https://github.com/linkedin/naarad/issues/337#issuecomment-135827922.
Sample data: cat q.csv 2015-08-25 00:00:00.0035,1 2015-08-25 00:00:01.0178,1 2015-08-25 00:00:01.0285,1 2015-08-25 00:00:01.0370,1 2015-08-25 00:00:01.0478,1 2015-08-25 00:00:01.0585,1 2015-08-25 00:00:01.0670,1 2015-08-25 00:00:01.0678,1 2015-08-25 00:00:02.0494,1 2015-08-25 00:00:02.0503,1 Config: naarad.q.conf [Query-QPS] infile=q.csv columns=query sep=,
[GRAPH] graphing_library=matplotlib Generated results: out/resources/Query-QPS.qps.csv 1440460800003,1.0 1440460801017,1.0 1440460801028,1.0 1440460801037,1.0 1440460801047,1.0 1440460801058,1.0 1440460801067,2.0
I could take a look at this bug next week if no one pick it up.
@zhenyun Can you take a look at this again, please install from the master branch as I used your CSV file and configuration and have this as my Query-QPS.qps.csv
file:
1440460800000,1.0
1440460801000,7.0
1440460802000,2.0
I'm honestly confused how I was able to get the above output as any subsequent tries gave me an error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/Users/rhsu/Downloads/Naarad/naarad/env/lib/python2.7/site-packages/naarad-1.0.15-py2.7.egg/naarad/utils.py", line 809, in parse_and_plot_single_metri
cs
if metric.parse():
File "/Users/rhsu/Downloads/Naarad/naarad/env/lib/python2.7/site-packages/naarad-1.0.15-py2.7.egg/naarad/metrics/metric.py", line 309, in parse
aggregate_timestamp, averaging_factor = self.get_aggregation_timestamp(ts, self.aggregation_granularity)
File "/Users/rhsu/Downloads/Naarad/naarad/env/lib/python2.7/site-packages/naarad-1.0.15-py2.7.egg/naarad/metrics/metric.py", line 193, in get_aggregation_ti
mestamp
if granularity.lower() == 'none':
AttributeError: 'NoneType' object has no attribute 'lower'
I then went to add quickly to metric.py
:
if granularity is None or granularity.lower() == 'none':
return int(timestamp), 1
...
And then got the results that @zhenyun has.
Thus we want if aggr_metrics=none
is set then it just uses the timestamps to calculate QPS so if same timestamps then it aggregates. If we put aggr_metrics=second
then it calculates correctly.
Questions:
- Does this mean if
aggr_metrics
is NOT defined then it should be based on the timestamps given OR based on seconds granularity? - As @feng-tao has mentioned, if
aggr_metrics=none
then QPS will be incorrect. Do we want to assure that QPS is always granularity calculated by seconds? Or ifaggr_metrics
is set then it calculates an average of the QPS's based on aggregation? Or does it calculate as queries per minute ifaggr_metrics=minute
etc.
Feel the default should be 'second', after all, qps stands for query per second. That is, if no "aggr_metrics" line, use second;
@feng-tao We've found out that actually aggr_metrics
is being used incorrectly.
At some point we pushed a change in which aggregation_granularity = aggr_metrics
and this is incorrect since granularity specifies second, minute, etc whereas aggr_metrics
defines a set of metrics that should be aggregated.
If you could look at this sometime, that'd be great, it is based on this merge commit from your pull request: https://github.com/linkedin/naarad/commit/239c936b0140af9072abdad3439b06cc48168966.
The aggr_metrics
should not be affecting the aggregation_granularity
and it should be that you set aggregation_granularity=none
if you don't want to aggregate (it'll aggregate by the timestamps given in the file) otherwise defaults to second
for the standard aggregation across seconds.
@richardhsu Thanks. I will submit a pull request to fix this. In the mean time, I think by setting aggr_metrics=second is temp work around.