"parallel hash aggregation panicked" with spill data to disk
Bug Report
1. Minimal reproduce step (Required)
2. What did you expect to see? (Required)
3. What did you see instead (Required)
github.com/pingcap/tidb/pkg/executor/aggregate.recoveryHashAgg
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_util.go:93
github.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggPartialWorker).spillDataToDiskImpl.func1
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_partial_worker.go:330
runtime.gopanic
/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:785
runtime.panicmem
/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:262
runtime.sigpanic
/go/pkg/mod/golang.org/[email protected]/src/runtime/signal_unix.go:917
bytes.(*Buffer).Bytes
/go/pkg/mod/golang.org/[email protected]/src/bytes/buffer.go:54
github.com/pingcap/tidb/pkg/util/serialization.SerializeBytesBuffer
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/serialization/serialization_util.go:171
github.com/pingcap/tidb/pkg/executor/aggfuncs.(*SerializeHelper).serializeBasePartialResult4GroupConcat
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggfuncs/spill_serialize_helper.go:135
github.com/pingcap/tidb/pkg/executor/aggfuncs.(*SerializeHelper).serializePartialResult4GroupConcat
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggfuncs/spill_serialize_helper.go:140
github.com/pingcap/tidb/pkg/executor/aggfuncs.(*groupConcat).SerializePartialResult
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggfuncs/func_group_concat.go:180
github.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggPartialWorker).spillDataToDiskImpl
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_partial_worker.go:362
github.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggPartialWorker).spillDataToDisk
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_partial_worker.go:316
github.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggExec).spill.func1
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_executor.go:544
Original message, but with the SQL statement redacted
[2025/06/13 13:01:16.438 +02:00] [ERROR] [agg_util.go:93] ["parallel hash aggregation panicked"] [error="runtime error: invalid memory address or nil pointer dereference"] [stack="github.com/pingcap/tidb/pkg/executor/aggregate.recoveryHashAgg\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_util.go:93\ngithub.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggPartialWorker).spillDataToDiskImpl.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_partial_worker.go:330\nruntime.gopanic\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:785\nruntime.panicmem\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:262\nruntime.sigpanic\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/signal_unix.go:917\nbytes.(*Buffer).Bytes\n\t/go/pkg/mod/golang.org/[email protected]/src/bytes/buffer.go:54\ngithub.com/pingcap/tidb/pkg/util/serialization.SerializeBytesBuffer\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/serialization/serialization_util.go:171\ngithub.com/pingcap/tidb/pkg/executor/aggfuncs.(*SerializeHelper).serializeBasePartialResult4GroupConcat\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggfuncs/spill_serialize_helper.go:135\ngithub.com/pingcap/tidb/pkg/executor/aggfuncs.(*SerializeHelper).serializePartialResult4GroupConcat\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggfuncs/spill_serialize_helper.go:140\ngithub.com/pingcap/tidb/pkg/executor/aggfuncs.(*groupConcat).SerializePartialResult\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggfuncs/func_group_concat.go:180\ngithub.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggPartialWorker).spillDataToDiskImpl\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_partial_worker.go:362\ngithub.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggPartialWorker).spillDataToDisk\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_partial_worker.go:316\ngithub.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggExec).spill.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_executor.go:544"]
[2025/06/13 13:01:17.151 +02:00] [INFO] [conn.go:1184] ["command dispatched failed"] [conn=3038773452] [session_alias=] [connInfo="id:3038773452, addr:10.10.62.11:9425 status:10, collation:utf8_general_ci, user:ibe"] [command=Query] [status="inTxn:0, autocommit:1"] [sql="explain analyze SELECT ..."] [txn_mode=PESSIMISTIC] [timestamp=458702705537056790] [err="runtime error: invalid memory address or nil pointer dereference\ngithub.com/pingcap/errors.Trace\n\t/go/pkg/mod/github.com/pingcap/[email protected]/juju_adaptor.go:15\ngithub.com/pingcap/tidb/pkg/util.GetRecoverError\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/util.go:288\ngithub.com/pingcap/tidb/pkg/executor/aggregate.recoveryHashAgg\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_util.go:91\ngithub.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggPartialWorker).spillDataToDiskImpl.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_partial_worker.go:330\nruntime.gopanic\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:785\nruntime.panicmem\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/panic.go:262\nruntime.sigpanic\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/signal_unix.go:917\nbytes.(*Buffer).Bytes\n\t/go/pkg/mod/golang.org/[email protected]/src/bytes/buffer.go:54\ngithub.com/pingcap/tidb/pkg/util/serialization.SerializeBytesBuffer\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/util/serialization/serialization_util.go:171\ngithub.com/pingcap/tidb/pkg/executor/aggfuncs.(*SerializeHelper).serializeBasePartialResult4GroupConcat\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggfuncs/spill_serialize_helper.go:135\ngithub.com/pingcap/tidb/pkg/executor/aggfuncs.(*SerializeHelper).serializePartialResult4GroupConcat\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggfuncs/spill_serialize_helper.go:140\ngithub.com/pingcap/tidb/pkg/executor/aggfuncs.(*groupConcat).SerializePartialResult\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggfuncs/func_group_concat.go:180\ngithub.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggPartialWorker).spillDataToDiskImpl\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_partial_worker.go:362\ngithub.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggPartialWorker).spillDataToDisk\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_partial_worker.go:316\ngithub.com/pingcap/tidb/pkg/executor/aggregate.(*HashAggExec).spill.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/pkg/executor/aggregate/agg_hash_executor.go:544\nruntime.goexit\n\t/go/pkg/mod/golang.org/[email protected]/src/runtime/asm_amd64.s:1700"]
4. What is your TiDB version? (Required)
v8.5.2
@xzhangxian1008 Is there a known issue related to this one ?
Could you provide Minimal reproduce step ?
@dveeden Can you provide procedures to reproduce it ?
- create table t (col0 int, col1 int)
- insert many data into t and ensure that all col1 are null
- execute
select group_concat(col1) from t group by col0and ensure that spill is triggered
@xzhangxian1008 I tried to reproduce this with a tiup playground to update which versions are affected, but I'm struggling to get it reproduced. It might somehow not use disk spill. Do you have some more detailed steps?
@xzhangxian1008 I tried to reproduce this with a
tiup playgroundto update which versions are affected, but I'm struggling to get it reproduced. It might somehow not use disk spill. Do you have some more detailed steps?
CREATE TABLE `agg_test` (
`col_key` int DEFAULT NULL,
`col0` int DEFAULT NULL,
`col1` decimal(20,2) DEFAULT NULL,
`col2` varchar(50) DEFAULT NULL,
`col3` datetime DEFAULT NULL,
`col4` int DEFAULT NULL
);
- insert 5 million rows and ensure that all col0 are null
- execute the following sql
explain analyze select group_concat(col0), group_concat(col1), group_concat(col2), group_concat(col3), group_concat(col4) from agg_test group by col_key;and ensure that spill is triggered
/remove-label may-affects-6.5
/remove-label may-affects-7.1
/remove-label may-affects-7.5
/label affects-7.5
... and ensure that spill is triggered
What's needed for that with tiup playground?
... and ensure that spill is triggered
What's needed for that with
tiup playground?
You can check if disk usage is 0 in explain analyze information to know if spill has been triggered. If it's 0, you can lower the system variable tidb_mem_quota_query to make it easier to be triggered. However, I think you can successfully trigger the spill with tidb_mem_quota_query's default value in my detailed steps.
It doesn't seem to work for me:
mysql-8.0.11-TiDB-v8.5.2> explain analyze select group_concat(col0), group_concat(col1), group_concat(col2), group_concat(col3), group_concat(col4) from agg_test group by col_key;
+---------------------------+------------+---------+-----------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+---------------------------+------------+---------+-----------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| HashAgg_5 | 4955443.20 | 1 | root | | time:1.54s, loops:2, RU:10215.368133 | group by:Column#18, funcs:group_concat(Column#13 separator ",")->Column#8, funcs:group_concat(Column#14 separator ",")->Column#9, funcs:group_concat(Column#15 separator ",")->Column#10, funcs:group_concat(Column#16 separator ",")->Column#11, funcs:group_concat(Column#17 separator ",")->Column#12 | 170.1 KB | 0 Bytes |
| └─Projection_12 | 6194304.00 | 5194304 | root | | time:132ms, loops:5083, Concurrency:5 | cast(test.agg_test.col0, var_string(20))->Column#13, cast(cast(test.agg_test.col1, decimal(20,2)), var_string(23))->Column#14, test.agg_test.col2->Column#15, cast(test.agg_test.col3, var_string(19))->Column#16, cast(test.agg_test.col4, var_string(20))->Column#17, test.agg_test.col_key->Column#18 | 1.41 MB | N/A |
| └─TableReader_9 | 6194304.00 | 5194304 | root | | time:151.9ms, loops:5083, cop_task: {num: 214, max: 43.3ms, min: 138.5µs, avg: 18.3ms, p95: 40.7ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 3.41s, tot_wait: 13ms, copr_cache_hit_ratio: 0.19, build_task_duration: 25.8µs, max_distsql_concurrency: 14}, rpc_info:{Cop:{num_rpc:214, total_time:3.92s}} | data:TableFullScan_8 | 74.4 MB | N/A |
| └─TableFullScan_8 | 6194304.00 | 5194304 | cop[tikv] | table:agg_test | tikv_task:{proc max:34ms, min:0s, avg: 12.9ms, p80:25ms, p95:28ms, iters:5914, tasks:214}, scan_detail: {total_process_keys: 5116256, total_process_keys_size: 588369440, total_keys: 5116429, get_snapshot_time: 7.49ms, rocksdb: {delete_skipped_count: 29130, key_skipped_count: 5145386, block: {cache_hit_count: 19894}}}, time_detail: {total_process_time: 3.41s, total_suspend_time: 6.08ms, total_wait_time: 13ms, total_kv_read_wall_time: 2.66s, tikv_wall_time: 3.46s} | keep order:false, stats:pseudo | N/A | N/A |
+---------------------------+------------+---------+-----------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
4 rows in set, 5 warnings (1.545 sec)
mysql-8.0.11-TiDB-v8.5.2> SET SESSION tidb_mem_quota_query=1<<26;
Query OK, 0 rows affected (0.001 sec)
mysql-8.0.11-TiDB-v8.5.2> explain analyze select group_concat(col0), group_concat(col1), group_concat(col2), group_concat(col3), group_concat(col4) from agg_test group by col_key;
ERROR 8175 (HY000): Your query has been cancelled due to exceeding the allowed memory limit for a single SQL query. Please try narrowing your query scope or increase the tidb_mem_quota_query limit and try again.[conn=2980052998]
mysql-8.0.11-TiDB-v8.5.2>
I assume that disk spill is somehow not working and this not related to this issue.
/remove-label affects-7.5
It doesn't seem to work for me:
mysql-8.0.11-TiDB-v8.5.2> explain analyze select group_concat(col0), group_concat(col1), group_concat(col2), group_concat(col3), group_concat(col4) from agg_test group by col_key; +---------------------------+------------+---------+-----------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+ | id | estRows | actRows | task | access object | execution info | operator info | memory | disk | +---------------------------+------------+---------+-----------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+ | HashAgg_5 | 4955443.20 | 1 | root | | time:1.54s, loops:2, RU:10215.368133 | group by:Column#18, funcs:group_concat(Column#13 separator ",")->Column#8, funcs:group_concat(Column#14 separator ",")->Column#9, funcs:group_concat(Column#15 separator ",")->Column#10, funcs:group_concat(Column#16 separator ",")->Column#11, funcs:group_concat(Column#17 separator ",")->Column#12 | 170.1 KB | 0 Bytes | | └─Projection_12 | 6194304.00 | 5194304 | root | | time:132ms, loops:5083, Concurrency:5 | cast(test.agg_test.col0, var_string(20))->Column#13, cast(cast(test.agg_test.col1, decimal(20,2)), var_string(23))->Column#14, test.agg_test.col2->Column#15, cast(test.agg_test.col3, var_string(19))->Column#16, cast(test.agg_test.col4, var_string(20))->Column#17, test.agg_test.col_key->Column#18 | 1.41 MB | N/A | | └─TableReader_9 | 6194304.00 | 5194304 | root | | time:151.9ms, loops:5083, cop_task: {num: 214, max: 43.3ms, min: 138.5µs, avg: 18.3ms, p95: 40.7ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 3.41s, tot_wait: 13ms, copr_cache_hit_ratio: 0.19, build_task_duration: 25.8µs, max_distsql_concurrency: 14}, rpc_info:{Cop:{num_rpc:214, total_time:3.92s}} | data:TableFullScan_8 | 74.4 MB | N/A | | └─TableFullScan_8 | 6194304.00 | 5194304 | cop[tikv] | table:agg_test | tikv_task:{proc max:34ms, min:0s, avg: 12.9ms, p80:25ms, p95:28ms, iters:5914, tasks:214}, scan_detail: {total_process_keys: 5116256, total_process_keys_size: 588369440, total_keys: 5116429, get_snapshot_time: 7.49ms, rocksdb: {delete_skipped_count: 29130, key_skipped_count: 5145386, block: {cache_hit_count: 19894}}}, time_detail: {total_process_time: 3.41s, total_suspend_time: 6.08ms, total_wait_time: 13ms, total_kv_read_wall_time: 2.66s, tikv_wall_time: 3.46s} | keep order:false, stats:pseudo | N/A | N/A | +---------------------------+------------+---------+-----------+----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+ 4 rows in set, 5 warnings (1.545 sec) mysql-8.0.11-TiDB-v8.5.2> SET SESSION tidb_mem_quota_query=1<<26; Query OK, 0 rows affected (0.001 sec) mysql-8.0.11-TiDB-v8.5.2> explain analyze select group_concat(col0), group_concat(col1), group_concat(col2), group_concat(col3), group_concat(col4) from agg_test group by col_key; ERROR 8175 (HY000): Your query has been cancelled due to exceeding the allowed memory limit for a single SQL query. Please try narrowing your query scope or increase the tidb_mem_quota_query limit and try again.[conn=2980052998] mysql-8.0.11-TiDB-v8.5.2>I assume that disk spill is somehow not working and this not related to this issue.
It seems that all your coly_keys have same value, make them different.
@xzhangxian1008 that together with increasing the total dataset to ~10M rows did the trick. Thanks!