starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

Segfault in array aggregation large memory usage

Open sergeyshaykhullin opened this issue 1 year ago • 2 comments

Steps to reproduce the behavior (Required)

CREATE TABLE `some_table` (
  `created_at` date NOT NULL COMMENT "",
  ...
  `some_array` array<int(11)> NOT NULL COMMENT "",
  `some_int` int(11) NOT NULL COMMENT "",
  ...
) ENGINE=OLAP 
DUPLICATE KEY(`created_at`)
COMMENT "OLAP"
PARTITION BY date_trunc('day', created_at)
...
PROPERTIES (
    "replication_num" = "3",
    "datacache.enable" = "true",
    "enable_async_write_back" = "false",
    "enable_persistent_index" = "false",
    "compression" = "LZ4"
);
  1. This query has ommited columns and conditions, but real query require double group by and cte

with query as (
  select
    created_at,
    array_length(array_remove(array_unique_agg(case when some_int > 0 then some_array end), null)) as some_array_length
  from some_table
  group by created_at
  order by created_at
)
select
  created_at,
  array_agg(some_array_length)
from query 
group by created_at
  1. sql client got
SQL Error [1064] [42000]: Internal error: vector::_M_default_append

Expected behavior (Required)

Query returns data or SR safely aborts query

Real behavior (Required)

StarRocks compute node logs

CN restarts each time when this sql is executing

2024-05-03 23:29:48.266	
/opt/starrocks/cn_entrypoint.sh: line 185:    27 Segmentation fault      (core dumped) $STARROCKS_HOME/bin/start_cn.sh $addition_args
2024-05-03 23:29:48.272	
[Fri May  3 23:29:48 MSK 2024] Receives signal to exit ...
2024-05-03 23:29:48.274	
[Fri May  3 23:29:48 MSK 2024] Can't find /opt/starrocks/cn/bin/cn.pid!
2024-05-03 23:29:48.277	
[Fri May  3 23:29:48 MSK 2024] Process conf file cn.conf ...
2024-05-03 23:29:48.279	
[Fri May  3 23:29:48 MSK 2024] try to drop myself(starrocks-cn-1.starrocks-cn-search.starrocks.svc.cluster.local) from FE ...
2024-05-03 23:29:48.304	
[Fri May  3 23:29:48 MSK 2024] run start_cn.sh
2024-05-03 23:30:00.869	
I0503 23:29:19.205855   305 plan_fragment_executor.cpp:369] cancel(): fragment_instance_id=4246fc79-0e77-4544-750b-6da44db86fbb
2024-05-03 23:30:00.869	
I0503 23:29:19.205862   305 fragment_mgr.cpp:574] FragmentMgr cancel worker going to cancel timeout fragment 4246fc79-0e77-4544-750b-6da44db86fbb
2024-05-03 23:30:00.869	
I0503 23:29:19.205873   305 plan_fragment_executor.cpp:369] cancel(): fragment_instance_id=2242d13b-5dd0-8b9e-86a4-56560d1889a4
2024-05-03 23:30:00.869	
I0503 23:29:19.205881   305 fragment_mgr.cpp:574] FragmentMgr cancel worker going to cancel timeout fragment 2242d13b-5dd0-8b9e-86a4-56560d1889a4
2024-05-03 23:30:00.869	
I0503 23:29:19.205891   305 plan_fragment_executor.cpp:369] cancel(): fragment_instance_id=4c47aa88-4d19-a2a3-9a2b-243b4e5b9ab3
2024-05-03 23:30:00.869	
I0503 23:29:19.205900   305 fragment_mgr.cpp:574] FragmentMgr cancel worker going to cancel timeout fragment 4c47aa88-4d19-a2a3-9a2b-243b4e5b9ab3
2024-05-03 23:30:00.869	
I0503 23:29:19.205909   305 plan_fragment_executor.cpp:369] cancel(): fragment_instance_id=004c35c3-9c88-056d-dd19-bbb965e2ed8e
2024-05-03 23:30:00.869	
I0503 23:29:19.205916   305 fragment_mgr.cpp:574] FragmentMgr cancel worker going to cancel timeout fragment 004c35c3-9c88-056d-dd19-bbb965e2ed8e
2024-05-03 23:30:00.869	
I0503 23:29:19.283994   459 load_channel_mgr.cpp:249] Memory consumption(bytes) limit=4174708211 current=0 peak=1469134464
2024-05-03 23:30:00.869	
W0503 23:29:19.192533   394 mem_hook.cpp:249] large memory alloc, query_id:cf7bddaa-098b-11ef-844d-f6857d1d0c72 instance: cf7bddaa-098b-11ef-844d-f6857d1d0cba acquire:1827184060 bytes, stack:
2024-05-03 23:30:00.869	
    @          0x69abbf2  malloc
2024-05-03 23:30:00.869	
    @          0xa531e8c  operator new()
2024-05-03 23:30:00.869	
    @          0x3da1bee  starrocks::FixedLengthColumnBase<>::append()
2024-05-03 23:30:00.869	
    @          0x3f625b2  starrocks::ArrayColumn::append()
2024-05-03 23:30:00.869	
    @          0x611f440  starrocks::Aggregator::output_chunk_by_streaming()
2024-05-03 23:30:00.869	
    @          0x624aa7a  starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_force_streaming()
2024-05-03 23:30:00.869	
    @          0x6253e2e  starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_limited_memory()
2024-05-03 23:30:00.869	
    @          0x62541c1  starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk()
2024-05-03 23:30:00.869	
    @          0x5f0a70f  starrocks::pipeline::PipelineDriver::process()
2024-05-03 23:30:00.869	
    @          0x67a745e  starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
2024-05-03 23:30:00.869	
    @          0x6b36c33  starrocks::ThreadPool::dispatch_thread()
2024-05-03 23:30:00.869	
    @          0x6b30f0a  starrocks::Thread::supervise_thread()
2024-05-03 23:30:00.869	
    @     0x7f8591d3dac3  (unknown)
2024-05-03 23:30:00.869	
    @     0x7f8591dcf850  (unknown)
2024-05-03 23:30:00.869	
    @              (nil)  (unknown)
2024-05-03 23:30:00.869	
*** Aborted at 1714768159 (unix time) try "date -d @1714768159" if you are using GNU date ***
2024-05-03 23:30:00.869	
PC: @     0x7f8591e49d50 (unknown)
2024-05-03 23:30:00.869	
*** SIGSEGV (@0x7f8428585000) received by PID 27 (TID 0x7f852f6c8640) from PID 676876288; stack trace: ***
2024-05-03 23:30:00.869	
    @          0x7f567da google::(anonymous namespace)::FailureSignalHandler()
2024-05-03 23:30:00.869	
    @     0x7f8591ceb520 (unknown)
2024-05-03 23:30:00.869	
    @     0x7f8591e49d50 (unknown)
2024-05-03 23:30:00.869	
    @          0x3da1b9a starrocks::FixedLengthColumnBase<>::append()
2024-05-03 23:30:00.869	
    @          0x3f625b2 starrocks::ArrayColumn::append()
2024-05-03 23:30:00.869	
    @          0x611f440 starrocks::Aggregator::output_chunk_by_streaming()
2024-05-03 23:30:00.869	
    @          0x624aa7a starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_force_streaming()
2024-05-03 23:30:00.869	
    @          0x6253e2e starrocks::pipeline::AggregateStreamingSinkOperator::_push_chunk_by_limited_memory()
2024-05-03 23:30:00.869	
    @          0x62541c1 starrocks::pipeline::AggregateStreamingSinkOperator::push_chunk()
2024-05-03 23:30:00.869	
    @          0x5f0a70f starrocks::pipeline::PipelineDriver::process()
2024-05-03 23:30:00.870	
    @          0x67a745e starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
2024-05-03 23:30:00.870	
    @          0x6b36c33 starrocks::ThreadPool::dispatch_thread()
2024-05-03 23:30:00.870	
    @          0x6b30f0a starrocks::Thread::supervise_thread()
2024-05-03 23:30:00.870	
    @     0x7f8591d3dac3 (unknown)
2024-05-03 23:30:00.870	
    @     0x7f8591dcf850 (unknown)
2024-05-03 23:30:00.870	
    @                0x0 (unknown)

StarRocks version (Required)

  • 3.2.3-a40e2f8
  • But 3.2.4, and 3.2.6 also has segfault

sergeyshaykhullin avatar May 03 '24 20:05 sergeyshaykhullin

Thanks. we will check and fix it

kangkaisen avatar May 06 '24 06:05 kangkaisen

I have a similar issue with _M_range_insert() :

W0517 13:22:21.327330 1683661 mem_hook.cpp:249] large memory alloc, query_id:462f2a3b-1450-11ef-9347-024271588a32 instance: 462f2a3b-1450-11ef-9347-024271588a34 acquire:2104349994 bytes, stack:
    @          0x69eef12  malloc
    @          0xa59596c  operator new()
    @          0x3deb8a6  std::vector<>::_M_range_insert<>()
    @          0x3def076  starrocks::BinaryColumnBase<>::append()
    @          0x3f892a6  starrocks::NullableColumn::append()
    @          0x60155e6  starrocks::JoinHashTable::append_chunk()
    @          0x600b351  starrocks::HashJoinBuilder::append_chunk()
    @          0x5fffe04  starrocks::HashJoiner::append_chunk_to_ht()
    @          0x634c3cc  starrocks::pipeline::HashJoinBuildOperator::push_chunk()
    @          0x5f4b642  starrocks::pipeline::PipelineDriver::process()
    @          0x67eb3be  starrocks::pipeline::GlobalDriverExecutor::_worker_thread()
    @          0x6b7b90c  starrocks::ThreadPool::dispatch_thread()
    @          0x6b74e1a  starrocks::Thread::supervise_thread()
    @     0x7f0d3af5fac3  (unknown)
    @     0x7f0d3aff1850  (unknown)
    @              (nil)  (unknown)

version 3.2.6

but I don't know which query is causing the crash.

renaudk avatar May 17 '24 15:05 renaudk

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!

github-actions[bot] avatar Nov 18 '24 11:11 github-actions[bot]