naarad icon indicating copy to clipboard operation
naarad copied to clipboard

Aggregation Incorrect for QPS

Open richardhsu opened this issue 9 years ago • 12 comments

If aggr_metrics=none is specified, the qps count is not correct. For instance, for the below example: It reports qps = 1 when it should be 4 and 1.

Also it only seems to plot second-level data points such that it only plots the last data points.

data.csv

1433409051000,30
1433409051200,10
1433409051300,30
1433409051400,16
1433409061000,30

naarad.conf

[Latency]
infile=data.csv
sep=,
columns=test
aggr_metrics=none


[GRAPH]
graphing_library=matplotlib

richardhsu avatar Jun 05 '15 21:06 richardhsu

Attached an email I sent earlier:

Hi,

After I explore a little bit, I don’t think the combine of “aggr_metrics=none" and qps reported correctly >could be solved easily. If we want to disable sub-second aggregation, naarad will generate one record >per timestamp into csv file. Then the qps generation logic reads the file and consider each entry as >separate record. In this case this qps is no longer qps any more but query per subsecond which does not >have much meanings.

Either rewrite the qps logic which will take some time or just forget about using qps when you don’t want >to use second aggregation.

-Tao

feng-tao avatar Jun 06 '15 16:06 feng-tao

CSV file is generated in correctly. We should fix the qps issue , otherwise it is not usable. :) I can put it as my next sprint story if nobody picks up ( a good excise for new hires though).

zhenyun avatar Aug 27 '15 23:08 zhenyun

@zhenyun I am still not clear what's incorrect here. if aggr_metrics=none, then qps cannot be calculated (since we are not aggregating anything)

RiteshMaheshwari avatar Aug 28 '15 16:08 RiteshMaheshwari

In my case, I did not specify anything. The metric is custom metric (not sar output). QPS (the csv file) is wrong.

On Fri, Aug 28, 2015 at 9:31 AM, Ritesh Maheshwari <[email protected]

wrote:

@zhenyun https://github.com/zhenyun I am still not clear what's incorrect here. if aggr_metrics=none, then qps cannot be calculated (since we are not aggregating anything)

— Reply to this email directly or view it on GitHub https://github.com/linkedin/naarad/issues/337#issuecomment-135825976.

Zen (Zhenyun Zhuang, Performance Team)

zhenyun avatar Aug 28 '15 16:08 zhenyun

Okay, feel free to pick up the investigation/story since you have the full context.

On Fri, Aug 28, 2015 at 9:40 AM, zhenyun [email protected] wrote:

In my case, I did not specify anything. The metric is custom metric (not sar output). QPS (the csv file) is wrong.

On Fri, Aug 28, 2015 at 9:31 AM, Ritesh Maheshwari < [email protected]

wrote:

@zhenyun https://github.com/zhenyun I am still not clear what's incorrect here. if aggr_metrics=none, then qps cannot be calculated (since we are not aggregating anything)

— Reply to this email directly or view it on GitHub https://github.com/linkedin/naarad/issues/337#issuecomment-135825976.

Zen (Zhenyun Zhuang, Performance Team)

— Reply to this email directly or view it on GitHub https://github.com/linkedin/naarad/issues/337#issuecomment-135827922.

RiteshMaheshwari avatar Aug 28 '15 16:08 RiteshMaheshwari

Sample data: cat q.csv 2015-08-25 00:00:00.0035,1 2015-08-25 00:00:01.0178,1 2015-08-25 00:00:01.0285,1 2015-08-25 00:00:01.0370,1 2015-08-25 00:00:01.0478,1 2015-08-25 00:00:01.0585,1 2015-08-25 00:00:01.0670,1 2015-08-25 00:00:01.0678,1 2015-08-25 00:00:02.0494,1 2015-08-25 00:00:02.0503,1 Config: naarad.q.conf [Query-QPS] infile=q.csv columns=query sep=,

[GRAPH] graphing_library=matplotlib Generated results: out/resources/Query-QPS.qps.csv 1440460800003,1.0 1440460801017,1.0 1440460801028,1.0 1440460801037,1.0 1440460801047,1.0 1440460801058,1.0 1440460801067,2.0

zhenyun avatar Aug 28 '15 17:08 zhenyun

I could take a look at this bug next week if no one pick it up.

feng-tao avatar Aug 28 '15 17:08 feng-tao

@zhenyun Can you take a look at this again, please install from the master branch as I used your CSV file and configuration and have this as my Query-QPS.qps.csv file:

1440460800000,1.0
1440460801000,7.0
1440460802000,2.0

richardhsu avatar Aug 28 '15 17:08 richardhsu

I'm honestly confused how I was able to get the above output as any subsequent tries gave me an error:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/Users/rhsu/Downloads/Naarad/naarad/env/lib/python2.7/site-packages/naarad-1.0.15-py2.7.egg/naarad/utils.py", line 809, in parse_and_plot_single_metri
cs
    if metric.parse():
  File "/Users/rhsu/Downloads/Naarad/naarad/env/lib/python2.7/site-packages/naarad-1.0.15-py2.7.egg/naarad/metrics/metric.py", line 309, in parse
    aggregate_timestamp, averaging_factor = self.get_aggregation_timestamp(ts, self.aggregation_granularity)
  File "/Users/rhsu/Downloads/Naarad/naarad/env/lib/python2.7/site-packages/naarad-1.0.15-py2.7.egg/naarad/metrics/metric.py", line 193, in get_aggregation_ti
mestamp
    if granularity.lower() == 'none':
AttributeError: 'NoneType' object has no attribute 'lower'

I then went to add quickly to metric.py:

if granularity is None or granularity.lower() == 'none':
  return int(timestamp), 1
...

And then got the results that @zhenyun has. Thus we want if aggr_metrics=none is set then it just uses the timestamps to calculate QPS so if same timestamps then it aggregates. If we put aggr_metrics=second then it calculates correctly.

Questions:

  1. Does this mean if aggr_metrics is NOT defined then it should be based on the timestamps given OR based on seconds granularity?
  2. As @feng-tao has mentioned, if aggr_metrics=none then QPS will be incorrect. Do we want to assure that QPS is always granularity calculated by seconds? Or if aggr_metrics is set then it calculates an average of the QPS's based on aggregation? Or does it calculate as queries per minute if aggr_metrics=minute etc.

richardhsu avatar Aug 28 '15 18:08 richardhsu

Feel the default should be 'second', after all, qps stands for query per second. That is, if no "aggr_metrics" line, use second;

zhenyun avatar Aug 28 '15 19:08 zhenyun

@feng-tao We've found out that actually aggr_metrics is being used incorrectly. At some point we pushed a change in which aggregation_granularity = aggr_metrics and this is incorrect since granularity specifies second, minute, etc whereas aggr_metrics defines a set of metrics that should be aggregated.

If you could look at this sometime, that'd be great, it is based on this merge commit from your pull request: https://github.com/linkedin/naarad/commit/239c936b0140af9072abdad3439b06cc48168966.

The aggr_metrics should not be affecting the aggregation_granularity and it should be that you set aggregation_granularity=none if you don't want to aggregate (it'll aggregate by the timestamps given in the file) otherwise defaults to second for the standard aggregation across seconds.

richardhsu avatar Aug 28 '15 19:08 richardhsu

@richardhsu Thanks. I will submit a pull request to fix this. In the mean time, I think by setting aggr_metrics=second is temp work around.

feng-tao avatar Aug 28 '15 20:08 feng-tao