Anemometer icon indicating copy to clipboard operation
Anemometer copied to clipboard

Incorrect query results and problems with FULL GROUP BY

Open solomonty opened this issue 7 years ago • 5 comments

I noticed that I was getting an invalid mapping of hostname_max and db_max after adding them to the query.

It appears that the GROUP BY is not properly grouping the data.

If I turn on FULL GROUP BY then I get an error message from MySQL.

Expression #2 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'slow_query_log.dimension.sample' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by (1055)

solomonty avatar Jan 24 '17 02:01 solomonty

These two columns get added to the query

  MAX(hostname_max) AS hostname_max,
  MAX(db_max) AS db_max,

solomonty avatar Jan 24 '17 02:01 solomonty

Here is the query generated by Anemometer

SELECT checksum AS `checksum`,
  LEFT(dimension.sample,20) AS `snippet`,
  ROUND(SUM(Rows_examined_sum)/SUM(rows_sent_sum),2) AS `index_ratio`,
  SUM(Query_time_sum) / SUM(ts_cnt) AS `query_time_avg`,
  ROUND(SUM(Rows_sent_sum)/SUM(ts_cnt),0) AS `rows_sent_avg`,
  MAX(hostname_max) AS `hostname_max`,
  MAX(db_max) AS `db_max`,
  SUM(ts_cnt) AS `ts_cnt`,
  SUM(Query_time_sum) AS `Query_time_sum`,
  SUM(Lock_time_sum) AS `Lock_time_sum`,
  SUM(Rows_sent_sum) AS `Rows_sent_sum`,
  SUM(Rows_examined_sum) AS `Rows_examined_sum`,
  SUM(Full_scan_sum) AS `Full_scan_sum`,
  SUM(Tmp_table_sum) AS `Tmp_table_sum`,
  SUM(Filesort_sum) AS `Filesort_sum`
 FROM `global_query_review` AS `fact`
 JOIN `global_query_review_history` AS `dimension` USING (`checksum`)
 WHERE dimension.ts_min >= "2017-01-23 02:39:13"
  AND dimension.ts_min <= "2017-01-24 02:39:13"
 GROUP BY checksum
 ORDER BY Query_time_sum DESC
 LIMIT 20

solomonty avatar Jan 24 '17 02:01 solomonty

The issue is that

MAX(hostname_max) and MAX(db_max) don't produce values that are from the same row.

solomonty avatar Jan 24 '17 03:01 solomonty

Adding snippet to the GROUP BY eliminates the FULL GROUP BY error but doesn't fix the query result.

GROUP BY checksum, snippet

solomonty avatar Jan 24 '17 03:01 solomonty

Removing the MAX() for hostname and db and adding them to the GROUP BY appears to return correct results and works with FULL GROUP BY.

 SELECT checksum AS `checksum`,
  LEFT(dimension.sample,20) AS `snippet`,
  ROUND(SUM(Rows_examined_sum)/SUM(rows_sent_sum),2) AS `index_ratio`,
  SUM(Query_time_sum) / SUM(ts_cnt) AS `query_time_avg`,
  ROUND(SUM(Rows_sent_sum)/SUM(ts_cnt),0) AS `rows_sent_avg`,
  hostname_max AS `hostname_max`,
  db_max AS `db_max`,
  SUM(ts_cnt) AS `ts_cnt`,
  SUM(Query_time_sum) AS `Query_time_sum`,
  SUM(Lock_time_sum) AS `Lock_time_sum`,
  SUM(Rows_sent_sum) AS `Rows_sent_sum`,
  SUM(Rows_examined_sum) AS `Rows_examined_sum`,
  SUM(Full_scan_sum) AS `Full_scan_sum`,
  SUM(Tmp_table_sum) AS `Tmp_table_sum`,
  SUM(Filesort_sum) AS `Filesort_sum`
 FROM `global_query_review` AS `fact`
 JOIN `global_query_review_history` AS `dimension` USING (`checksum`)
 WHERE dimension.ts_min >= "2017-01-23 02:39:13"
  AND dimension.ts_min <= "2017-01-24 02:39:13"
 GROUP BY checksum, snippet, hostname_max, db_max
 ORDER BY Query_time_sum DESC
 LIMIT 20

solomonty avatar Jan 24 '17 03:01 solomonty