solana
solana copied to clipboard
Abnormal memory consumption in private cluster
Problem
While running private cluster with 2 nodes and one client with master, I've discovered the following memory trend:
![Screenshot 2022-08-29 at 16 10 02](https://user-images.githubusercontent.com/687962/188140994-a365488f-389b-4683-9f65-80e4b96fc472.png)
What has happen at 18:20 -- bench-tps
client has been restarted because it is configured to run for 7500s in the net scripts.
Two plot correspond to two nodes.
There are 3 problems with this memory plot:
- Memory consumption grows in time. (For some reason this growth was triggered by relaunching bench-tps)
- Memory consumption is spiky -- it signals that there are a lot of alloc/dealloc happening
- The width of spikes is growing
Problems (2) and (3) are present for both nodes, while problem (1) only for blue one.
Memory profiling
I've profiled with dhat tool (special thanks @alessandrod for showing this tool).
Profiling worked on one node cluster only, it also had severe performance consequences.
Note, that for the run shown below I've replaced into_par_iter
to into_iter
here to have a more informative call stack in the profile output.
For this run with memory profiling we observe the following:
![Screenshot 2022-09-02 at 14 37 38](https://user-images.githubusercontent.com/687962/188146329-b68338ce-11c2-4c82-89f0-d59f11347af3.png)
One important observation is that spike event is ~ 13:40.
In the profile output, the leaf call is the following:
![Screenshot 2022-09-02 at 09 51 30](https://user-images.githubusercontent.com/687962/188148570-87fe1581-eb0b-4a80-a6d7-738dfb704319.png)
So the suspect is solana_runtime::accounts_db::AppendVecScan>::init_accum (runtime/src/accounts_db.rs:1817:13)
The call of init_accum
grow vec<vec<>>
. The size of this vector is ~32640. From the plot above we observe that num_chunks
grow and this growth is started at 13:41. From the log also:
Report of dhat might be found in this folder and can be opened with https://nnethercote.github.io/dh_view/dh_view.html
Checking out v1.11.10
To check if the problem is there for v1.11.10, we did another run of private cluster.
This time memory profile looks differently. In particular, it lacks (3). Memory spikes do not grow and also slots count is stable:
Hypothesis
After looking into this data with @alessandrod , we came up with the hypothesis that memory spikes are due to scan_account_storage_no_bank.
I think that the amplification of spikes is not present for v1.11.10 due to this commit which was not added to this version.
The spikes themself are due to memory allocations during chunks scanning.
Proposed Solution
Similar behavior now for branch v1.11 (e6a9c7e47c26f753f791509c77f6e1e39b8c5cbc)
it contains changes which contribute to amplification of the memory in scanner. But it doesn't show the memory growth as for the master (6684c62280d824a03e789f0e100a78c634bb1f01) shown above. Which might suggest that these problems are different.
Current master afa2ae2986cd704c0d7cb22bfa7238cd9e4372cc shows different trend:
Looks like problem with "fur" has been resolved by this PR: https://github.com/solana-labs/solana/pull/27461 The problem with memory growth is there.
I've profiled the moment of growth with memleak.py
eBPF-based tool which collects information about outstanding long-living allocations:
11448320 bytes in 48 allocations from stack
0x00005649d792c13f memmap2::os::MmapInner::map_mut::h103aef6f6c352286+0x9f [solana-validator]
0x00005649d723be23 memmap2::MmapMut::map_mut::hc878c70e96a171e1+0x53 [solana-validator]
0x00005649d704eae2 solana_runtime::append_vec::AppendVec::new::h037be5d3956f93e1+0x182 [solana-validator]
0x00005649d71aca25 solana_runtime::accounts_db::AccountStorageEntry::new::hd6792b2f457b2a6e+0x115 [solana-validator]
0x00005649d71c5912 solana_runtime::accounts_db::AccountsDb::create_store::h90136ccd567e2d1b+0xa2 [solana-validator]
0x00005649d71bc863 solana_runtime::accounts_db::AccountsDb::get_store_for_shrink::hc059c51f5d663fba+0x1a3 [solana-validator]
0x00005649d719d2f2 core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnMut$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_mut::h236c9325c20e4fd9+0xaf2 [solana-validator]
0x00005649d71106a6 _$LT$core..iter..adapters..map..Map$LT$I$C$F$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$::fold::hd7f27a87336308e5+0x76 [solana-validator]
0x00005649d726c303 rayon::iter::plumbing::bridge_producer_consumer::helper::h846dac16a50cb7e8+0x63 [solana-validator]
0x00005649d725f1d3 rayon_core::registry::in_worker::he184540d33da092c+0x2b3 [solana-validator]
0x00005649d726c3bf rayon::iter::plumbing::bridge_producer_consumer::helper::h846dac16a50cb7e8+0x11f [solana-validator]
0x00005649d725f1d3 rayon_core::registry::in_worker::he184540d33da092c+0x2b3 [solana-validator]
0x00005649d726c3bf rayon::iter::plumbing::bridge_producer_consumer::helper::h846dac16a50cb7e8+0x11f [solana-validator]
0x00005649d725f1d3 rayon_core::registry::in_worker::he184540d33da092c+0x2b3 [solana-validator]
0x00005649d726c3bf rayon::iter::plumbing::bridge_producer_consumer::helper::h846dac16a50cb7e8+0x11f [solana-validator]
0x00005649d703f3a2 _$LT$rayon..vec..IntoIter$LT$T$GT$$u20$as$u20$rayon..iter..IndexedParallelIterator$GT$::with_producer::h7be1d112c1b44ff8+0xc2 [solana-validator]
0x00005649d70ebf2d _$LT$core..panic..unwind_safe..AssertUnwindSafe$LT$F$GT$$u20$as$u20$core..ops..function..FnOnce$LT$$LP$$RP$$GT$$GT$::call_once::hd5c2c25b22f16984+0xed [solana-validator]
0x00005649d709bdf6 _ZN83_$LT$rayon_core..job..StackJob$LT$L$C$F$C$R$GT$$u20$as$u20$rayon_core..job..Job$GT$7execute17h3645fdebe8435b28E.llvm.7304792509247438073+0x56 [solana-validator]
0x00005649d5cef648 rayon_core::registry::WorkerThread::wait_until_cold::hf719c0333e1c469c+0x308 [solana-validator]
0x00005649d7891a30 rayon_core::registry::ThreadBuilder::run::h218b6e6a0796ae65+0x310 [solana-validator]
0x00005649d7894a31 _ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h98bf7e27e7f462daE.llvm.4672532708118862921+0x41 [solana-validator]
0x00005649d788e2d0 core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h1b17b96de63b704e+0xb0 [solana-validator]
0x00005649d7b109b3 std::sys::unix::thread::Thread::new::thread_start::h5d4e11bbda4161c8+0x23 [solana-validator]
107479040 bytes in 8 allocations from stack
0x00005649d7ac6293 _rjem_je_pages_map+0x63 [solana-validator]
657244160 bytes in 400 allocations from stack
0x00005649d792c13f memmap2::os::MmapInner::map_mut::h103aef6f6c352286+0x9f [solana-validator]
0x00005649d723be23 memmap2::MmapMut::map_mut::hc878c70e96a171e1+0x53 [solana-validator]
0x00005649d704eae2 solana_runtime::append_vec::AppendVec::new::h037be5d3956f93e1+0x182 [solana-validator]
0x00005649d71aca25 solana_runtime::accounts_db::AccountStorageEntry::new::hd6792b2f457b2a6e+0x115 [solana-validator]
0x00005649d71c5912 solana_runtime::accounts_db::AccountsDb::create_store::h90136ccd567e2d1b+0xa2 [solana-validator]
0x00005649d71cb343 solana_runtime::accounts_db::AccountsDb::flush_rooted_accounts_cache::h4e80df79ac468be7+0x813 [solana-validator]
0x00005649d71c9403 solana_runtime::accounts_db::AccountsDb::flush_accounts_cache::h1297d04cdc823b70+0xc3 [solana-validator]
0x00005649d704c0a5 solana_runtime::accounts_background_service::SnapshotRequestHandler::handle_snapshot_requests::h3095f05990a42be2+0x2b5 [solana-validator]
0x00005649d7077033 std::sys_common::backtrace::__rust_begin_short_backtrace::h1903f68ea30a084b+0x393 [solana-validator]
0x00005649d71e3378 core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h78829cde464f478b+0x98 [solana-validator]
0x00005649d7b109b3 std::sys::unix::thread::Thread::new::thread_start::h5d4e11bbda4161c8+0x23 [solana-validator]
The most interesting allocation is the last one. It looks like several functions in the call chain were inlined, they probable call stack is the following:
create_store
create_and_insert_store_with_paths
create_and_insert_store
do_flush_slot_cache
flush_slot_cache_with_clean
flush_rooted_accounts_cache
Another question is why memory grow on one machine but not on the other. This is unclear
To exclude that the problem is due to impossible super majority, I've tried the same with 5 nodes:
cd net/
./gce.sh create -n 4 -c 1
./init-metrics.sh <login>
./net.sh start --profile
Looks the same as for 2 nodes:
As before there are 2 problems:
- 10G frequent allocations/deallocations
- Memory grows for one of the nodes
The cause of the problem (1) is identified (see above) with high certainty. The cause of the problem (2) is not clear.
Observations:
-
The memory grows only in one of the nodes (Cujx...)
-
All the traffic is received by this node
-
This node has very low number of quic streams compared to the others 277 vs 380K
-
Checking out outstanding allocations on this node identify
create_store
call withinflush_rooted_accounts_cache
(see comments above) but when I've fixed and checked corresponding metric, I don't observe increase of number of calls. But it might be that these store objects are stored somewhere and never released, this is not checked I guess.
A note about store_create
and memory peaks: if we look into time of memory peaks and at create_store count, they are coincident:
If we use TpuClient instead of ThinClient, it leads to uniform traffic distribution among nodes and this lower the memory growth trend (but it is still present). Surprisingly, it also shows increase in amplitude for the allocation/deallocations:
![Screenshot 2022-09-20 at 17 57 18](https://user-images.githubusercontent.com/687962/191308316-33d796ec-589c-48c9-baee-4b3f51ce260d.png)
This run was for cccade42b3ea73dc8be39c659d1b5e8f950b341b
Holes in the traffic are due to client restarts. It happens when it detects that blockhash wasn't updated for more than 120s.