datafusion
datafusion copied to clipboard
Wrong metric when using EXPLAIN ANALYZE
Describe the bug submit a query
❯ select sum("30"), "29" from test1 group by "29";
198683 rows in set. Query took 0.436 seconds.
then use explain analyze
explain analyze select sum("30"), "29" from test1 group by "29";
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalescePartitionsExec, metrics=[output_rows=198683, elapsed_compute=47.871µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | ProjectionExec: expr=[SUM(test1.30)@1 as SUM(test1.30), 29@0 as 29], metrics=[output_rows=198683, elapsed_compute=103.299µs, spill_count=0, spilled_bytes=0, mem_used=0] |
| | AggregateExec: mode=FinalPartitioned, gby=[29@0 as 29], aggr=[SUM(test1.30)], metrics=[output_rows=198683, elapsed_compute=514.502772ms, spill_count=0, spilled_bytes=0, mem_used=0] |
| | CoalesceBatchesExec: target_batch_size=4096, metrics=[output_rows=972749, elapsed_compute=91.069241ms, spill_count=0, spilled_bytes=0, mem_used=0] |
| | RepartitionExec: partitioning=Hash([Column { name: "29", index: 0 }], 100), metrics=[fetch_time{inputPartition=0}=5.544007548s, repart_time{inputPartition=0}=75.322775ms, send_time{inputPartition=0}=14.473058ms] |
| | AggregateExec: mode=Partial, gby=[29@1 as 29], aggr=[SUM(test1.30)], metrics=[output_rows=972749, elapsed_compute=1.915277419s, spill_count=0, spilled_bytes=0, mem_used=0] |
| | RepartitionExec: partitioning=RoundRobinBatch(100), metrics=[send_time{inputPartition=0}=243.407µs, fetch_time{inputPartition=0}=43.401471ms, repart_time{inputPartition=0}=1ns] |
| | ParquetExec: limit=None, projection=[30, 29], metrics=[output_rows=1000202, elapsed_compute=13.253088ms, spill_count=0, spilled_bytes=0, mem_used=0, num_predicate_creation_errors=0, row_groups_pruned{filename=kylin/rno_native_dev/kylin_metadata_test/ssb/parquet/kylin_udaf_cube/19900101000000_20000101000000_ACG/17179869183/part-00005-84c1f179-eda5-4947-86ca-3d2497b485c4-c000.snappy.parquet}=0, bytes_scanned{filename=kylin/rno_native_dev/kylin_metadata_test/ssb/parquet/kylin_udaf_cube/19900101000000_20000101000000_ACG/17179869183/part-00005-84c1f179-eda5-4947-86ca-3d2497b485c4-c000.snappy.parquet}=5959033, predicate_evaluation_errors{filename=kylin/rno_native_dev/kylin_metadata_test/ssb/parquet/kylin_udaf_cube/19900101000000_20000101000000_ACG/17179869183/part-00005-84c1f179-eda5-4947-86ca-3d2497b485c4-c000.snappy.parquet}=0] |
| | |
+-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set. Query took 0.093 seconds.
the first elapsed_compute=1.915277419s in AGG is even longer than the query cose 😂
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
Additional context Add any other context about the problem here.
I think elapsed_compute is meant to capture how much cpu time was taken -- so if the query ran on 16 cores for 1 second of wall clock time, then I would expect that elapsed_compute would be reported as 16 sec
That being said, it is hard to imagine how 0.093 seconds query time could use 1.9 sec of compute 🤔