[Custom Processors] High RAM consumption

Open keyliaran opened this issue 2 years ago • 1 comments

Description

There appears to be a memory leak issue, as evidenced by the increasing memory consumption observed during the indexing process. Valgrind has been utilized to analyze memory allocations, revealing a significant portion of memory being allocated within the transaction vector. After processing each batch, the memory consumption persists and continues to grow. After two minute of indexing, the indexer's RAM usage exceeds 5 GB and continues to increase.

massif report: ->12.31% (76,360,304B) 0x2285542: alloc (alloc.rs:98) | ->12.31% (76,360,304B) 0x2285542: alloc::alloc::Global::alloc_impl (alloc.rs:181) | ->12.31% (76,360,304B) 0x2286318: <alloc::alloc::Global as core::alloc::Allocator>::allocate (alloc.rs:241) | ->12.31% (76,360,304B) 0x228607E: alloc::raw_vec::finish_grow (raw_vec.rs:521) | ->05.87% (36,373,376B) 0x6E834A: alloc::raw_vec::RawVec<T,A>::grow_amortized (raw_vec.rs:433) | | ->05.87% (36,373,376B) 0x70A8B8: alloc::raw_vec::RawVec<T,A>::reserve_for_push (raw_vec.rs:318) | | ->05.87% (36,373,376B) 0xAF5A77: alloc::vec::Vec<T,A>::push (mod.rs:1922) | | ->05.87% (36,373,376B) 0xB91356: prost::encoding::message::merge_repeated (encoding.rs:1114) | | ->04.03% (24,965,408B) 0xC78005: <aptos_protos::pb::aptos::transaction::v1::MoveStructTag as prost::message::Message>::merge_field (aptos.transaction.v1.rs:739) | | | ->04.03% (24,965,408B) 0xB9D7F6: prost::encoding::message::merge::{{closure}} (encoding.rs:1086) | | | ->04.03% (24,965,408B) 0x139AC98: prost::encoding::merge_loop (encoding.rs:374) | | | ->04.03% (24,965,408B) 0xB96C34: prost::encoding::message::merge (encoding.rs:1080) | | | ->04.02% (24,932,544B) 0xC7857A: <aptos_protos::pb::aptos::transaction::v1::WriteResource as prost::message::Message>::merge_field (aptos.transaction.v1.rs:398) | | | | ->04.02% (24,932,544B) 0xBA2456: prost::encoding::message::merge::{{closure}} (encoding.rs:1086) | | | | ->04.02% (24,932,544B) 0x139E8F8: prost::encoding::merge_loop (encoding.rs:374) | | | | ->04.02% (24,932,544B) 0xB95734: prost::encoding::message::merge (encoding.rs:1080) | | | | ->04.02% (24,932,544B) 0x8861EA: aptos_protos::pb::aptos::transaction::v1::write_set_change::Change::merge (aptos.transaction.v1.rs:329) | | | | ->04.02% (24,932,544B) 0xC688AA: <aptos_protos::pb::aptos::transaction::v1::WriteSetChange as prost::message::Message>::merge_field (aptos.transaction.v1.rs:278) | | | | ->04.02% (24,932,544B) 0xBA06F6: prost::encoding::message::merge::{{closure}} (encoding.rs:1086) | | | | ->04.02% (24,932,544B) 0x13A2838: prost::encoding::merge_loop (encoding.rs:374) | | | | ->04.02% (24,932,544B) 0xB94234: prost::encoding::message::merge (encoding.rs:1080) | | | | ->04.02% (24,932,544B) 0xB909BA: prost::encoding::message::merge_repeated (encoding.rs:1113) | | | | ->04.02% (24,932,544B) 0xC69C4E: <aptos_protos::pb::aptos::transaction::v1::TransactionInfo as prost::message::Message>::merge_field (aptos.transaction.v1.rs:166) | | | | ->04.02% (24,932,544B) 0xB9B7A6: prost::encoding::message::merge::{{closure}} (encoding.rs:1086) | | | | ->04.02% (24,932,544B) 0x139B818: prost::encoding::merge_loop (encoding.rs:374) | | | | ->04.02% (24,932,544B) 0xB96B34: prost::encoding::message::merge (encoding.rs:1080) | | | | ->04.02% (24,932,544B) 0xC75E54: <aptos_protos::pb::aptos::transaction::v1::Transaction as prost::message::Message>::merge_field (aptos.transaction.v1.rs:37) | | | | ->04.02% (24,932,544B) 0xB9F846: prost::encoding::message::merge::{{closure}} (encoding.rs:1086) | | | | ->04.02% (24,932,544B) 0x139EBD8: prost::encoding::merge_loop (encoding.rs:374) | | | | ->04.02% (24,932,544B) 0xB95A34: prost::encoding::message::merge (encoding.rs:1080) | | | | ->04.02% (24,932,544B) 0xB91B0A: prost::encoding::message::merge_repeated (encoding.rs:1113) | | | | | | | ->00.01% (32,864B) in 1+ places, all below ms_print's threshold (01.00%)

Repro

Run indexer (commit 48d779449caa5b10a51a6319c61992af0edb52ce) with config: health_check_port: 8084 server_config: processor_config: type: coin_processor indexer_grpc_data_service_address: https://grpc.testnet.aptoslabs.com:443 postgres_connection_string: *********** auth_token: ************** number_concurrent_processing_tasks: 1 starting_version: 951262066

Mar 21 '24 15:03 keyliaran

Thanks for the report. I think this relates to that we store many indexer grpc responses pending for process.

On our road map, we're planning to make the backpressure based on size instead of transaction count, which can reduce the processor usage.

We choose the number(total number of txns parked in memory) for now to maximize the performance of processing without hitting backpressure.

Mar 22 '24 00:03 larry-aptos