starrocks
starrocks copied to clipboard
Segfault in array aggregation large memory usage
Steps to reproduce the behavior (Required)
CREATE TABLE `some_table` (
`created_at` date NOT NULL COMMENT "",
...
`some_array` array<int(11)> NOT NULL COMMENT "",
`some_int` int(11) NOT NULL COMMENT "",
...
) ENGINE=OLAP
DUPLICATE KEY(`created_at`)
COMMENT "OLAP"
PARTITION BY date_trunc('day', created_at)
...
PROPERTIES (
"replication_num" = "3",
"datacache.enable" = "true",
"enable_async_write_back" = "false",
"enable_persistent_index" = "false",
"compression" = "LZ4"
);
- This query has ommited columns and conditions, but real query require double group by and cte
with query as (
select
created_at,
array_length(array_remove(array_unique_agg(case when some_int > 0 then some_array end), null)) as some_array_length
from some_table
group by created_at
order by created_at
)
select
created_at,
array_agg(some_array_length)
from query
group by created_at
- sql client got
SQL Error [1064] [42000]: Internal error: vector::_M_default_append
Expected behavior (Required)
Query returns data or SR safely aborts query
Real behavior (Required)
StarRocks compute node logs
CN restarts each time when this sql is executing
2024-05-03 23:29:48.266
/opt/starrocks/cn_entrypoint.sh: line 185: 27 Segmentation fault (core dumped) $STARROCKS_HOME/bin/start_cn.sh $addition_args
2024-05-03 23:29:48.272
[Fri May 3 23:29:48 MSK 2024] Receives signal to exit ...
2024-05-03 23:29:48.274
[Fri May 3 23:29:48 MSK 2024] Can't find /opt/starrocks/cn/bin/cn.pid!
2024-05-03 23:29:48.277
[Fri May 3 23:29:48 MSK 2024] Process conf file cn.conf ...
2024-05-03 23:29:48.279
[Fri May 3 23:29:48 MSK 2024] try to drop myself(starrocks-cn-1.starrocks-cn-search.starrocks.svc.cluster.local) from FE ...
2024-05-03 23:29:48.304
[Fri May 3 23:29:48 MSK 2024] run start_cn.sh
2024-05-03 23:30:00.869
I0503 23:29:19.205855 305 plan_fragment_executor.cpp:369] cancel(): fragment_instance_id=4246fc79-0e77-4544-750b-6da44db86fbb
2024-05-03 23:30:00.869
I0503 23:29:19.205862 305 fragment_mgr.cpp:574] FragmentMgr cancel worker going to cancel timeout fragment 4246fc79-0e77-4544-750b-6da44db86fbb
2024-05-03 23:30:00.869
I0503 23:29:19.205873 305 plan_fragment_executor.cpp:369] cancel(): fragment_instance_id=2242d13b-5dd0-8b9e-86a4-56560d1889a4
2024-05-03 23:30:00.869
I0503 23:29:19.205881 305 fragment_mgr.cpp:574] FragmentMgr cancel worker going to cancel timeout fragment 2242d13b-5dd0-8b9e-86a4-56560d1889a4
2024-05-03 23:30:00.869
I0503 23:29:19.205891 305 plan_fragment_executor.cpp:369] cancel(): fragment_instance_id=4c47aa88-4d19-a2a3-9a2b-243b4e5b9ab3
2024-05-03 23:30:00.869
I0503 23:29:19.205900 305 fragment_mgr.cpp:574] FragmentMgr cancel worker going to cancel timeout fragment 4c47aa88-4d19-a2a3-9a2b-243b4e5b9ab3
2024-05-03 23:30:00.869
I0503 23:29:19.205909 305 plan_fragment_executor.cpp:369] cancel(): fragment_instance_id=004c35c3-9c88-056d-dd19-bbb965e2ed8e
2024-05-03 23:30:00.869
I0503 23:29:19.205916 305 fragment_mgr.cpp:574] FragmentMgr cancel worker going to cancel timeout fragment 004c35c3-9c88-056d-dd19-bbb965e2ed8e
2024-05-03 23:30:00.869
I0503 23:29:19.283994 459 load_channel_mgr.cpp:249] Memory consumption(bytes) limit=4174708211 current=0 peak=1469134464
2024-05-03 23:30:00.869
W0503 23:29:19.192533 394 mem_hook.cpp:249] large memory alloc, query_id:cf7bddaa-098b-11ef-844d-f6857d1d0c72 instance: cf7bddaa-098b-11ef-844d-f6857d1d0cba acquire:1827184060 bytes, stack:
2024-05-03 23:30:00.869
@ 0x69abbf2 malloc
2024-05-03 23:30:00.869
@ 0xa531e8c operator new()
2024-05-03 23:30:00.869
@ 0x3da1bee starrocks::FixedLengthColumnBase<>::append()
2024-05-03 23:30:00.869
@ 0x3f625b2 starrocks::ArrayColumn::append()
2024-05-03 23:30:00.869
@ 0x611f440 starrocks::Aggregator::output_chunk_by_streaming()
2024-05-03 23:30:00.869
@ 0x624aa7a starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_force_streaming()
2024-05-03 23:30:00.869
@ 0x6253e2e starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_limited_memory()
2024-05-03 23:30:00.869
@ 0x62541c1 starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk()
2024-05-03 23:30:00.869
@ 0x5f0a70f starrocks::pipeline::PipelineDriver::process()
2024-05-03 23:30:00.869
@ 0x67a745e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
2024-05-03 23:30:00.869
@ 0x6b36c33 starrocks::ThreadPool::dispatch_thread()
2024-05-03 23:30:00.869
@ 0x6b30f0a starrocks::Thread::supervise_thread()
2024-05-03 23:30:00.869
@ 0x7f8591d3dac3 (unknown)
2024-05-03 23:30:00.869
@ 0x7f8591dcf850 (unknown)
2024-05-03 23:30:00.869
@ (nil) (unknown)
2024-05-03 23:30:00.869
*** Aborted at 1714768159 (unix time) try "date -d @1714768159" if you are using GNU date ***
2024-05-03 23:30:00.869
PC: @ 0x7f8591e49d50 (unknown)
2024-05-03 23:30:00.869
*** SIGSEGV (@0x7f8428585000) received by PID 27 (TID 0x7f852f6c8640) from PID 676876288; stack trace: ***
2024-05-03 23:30:00.869
@ 0x7f567da google::(anonymous namespace)::FailureSignalHandler()
2024-05-03 23:30:00.869
@ 0x7f8591ceb520 (unknown)
2024-05-03 23:30:00.869
@ 0x7f8591e49d50 (unknown)
2024-05-03 23:30:00.869
@ 0x3da1b9a starrocks::FixedLengthColumnBase<>::append()
2024-05-03 23:30:00.869
@ 0x3f625b2 starrocks::ArrayColumn::append()
2024-05-03 23:30:00.869
@ 0x611f440 starrocks::Aggregator::output_chunk_by_streaming()
2024-05-03 23:30:00.869
@ 0x624aa7a starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_force_streaming()
2024-05-03 23:30:00.869
@ 0x6253e2e starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_limited_memory()
2024-05-03 23:30:00.869
@ 0x62541c1 starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk()
2024-05-03 23:30:00.869
@ 0x5f0a70f starrocks::pipeline::PipelineDriver::process()
2024-05-03 23:30:00.870
@ 0x67a745e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
2024-05-03 23:30:00.870
@ 0x6b36c33 starrocks::ThreadPool::dispatch_thread()
2024-05-03 23:30:00.870
@ 0x6b30f0a starrocks::Thread::supervise_thread()
2024-05-03 23:30:00.870
@ 0x7f8591d3dac3 (unknown)
2024-05-03 23:30:00.870
@ 0x7f8591dcf850 (unknown)
2024-05-03 23:30:00.870
@ 0x0 (unknown)
StarRocks version (Required)
- 3.2.3-a40e2f8
- But 3.2.4, and 3.2.6 also has segfault
Thanks. we will check and fix it
I have a similar issue with _M_range_insert() :
W0517 13:22:21.327330 1683661 mem_hook.cpp:249] large memory alloc, query_id:462f2a3b-1450-11ef-9347-024271588a32 instance: 462f2a3b-1450-11ef-9347-024271588a34 acquire:2104349994 bytes, stack:
@ 0x69eef12 malloc
@ 0xa59596c operator new()
@ 0x3deb8a6 std::vector<>::_M_range_insert<>()
@ 0x3def076 starrocks::BinaryColumnBase<>::append()
@ 0x3f892a6 starrocks::NullableColumn::append()
@ 0x60155e6 starrocks::JoinHashTable::append_chunk()
@ 0x600b351 starrocks::HashJoinBuilder::append_chunk()
@ 0x5fffe04 starrocks::HashJoiner::append_chunk_to_ht()
@ 0x634c3cc starrocks::pipeline::HashJoinBuildOperator::push_chunk()
@ 0x5f4b642 starrocks::pipeline::PipelineDriver::process()
@ 0x67eb3be starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
@ 0x6b7b90c starrocks::ThreadPool::dispatch_thread()
@ 0x6b74e1a starrocks::Thread::supervise_thread()
@ 0x7f0d3af5fac3 (unknown)
@ 0x7f0d3aff1850 (unknown)
@ (nil) (unknown)
version 3.2.6
but I don't know which query is causing the crash.
We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!