[BUG] Out of memory exeption

Open timofeevmd opened this issue 9 months ago • 1 comments

OS and Environment

Linux, AWS, k8s

GIT commit hash

f348b9a8

Minimum working example / Steps to reproduce

perf report include full details list of precondition

### 0. Test objective:

Apply the load in the required volume.
Apply the load over an extended period of time.

#### 1. Infrastructure

iroha version: version="2.0.0-rc.1" git_commit_sha="f348b9a8"
java sdk version: commit: commit: efeb5a233e
iroha2-perf version: commit : 041736f
5 peers

### SPECIAL CONDITION FOR STAND PREPARATION

We increased the disks for Longlive to 20 GB.
Ingress initially had more resources (2x horizontally) at the start of the load test to handle the peak load during the sudden start.
Iroha has priority; Kubernetes relocates pods within the cluster based on priorities. By default, Iroha has priority over other applications. However, services like NGINX have a higher priority, which makes sense. For that test, I increased the priority for our iroha2-test.

#### 2. images/config

image - testnet-2.0.0-rc.1.f348b9a8 - harbor image
config - config.txt
genesis

### PREPARATION LONGEVITY ENV

Access to standard monitoring tools

On the perf generator

git clone https://github.com/soramitsu/iroha2-perf.git &&
git checkout iroha/2_0_0-rc_1/keypair &&
cd performance-generator/ &&
mvn -N io.takari:maven:wrapper &&
./mvnw gatling:test -Dgatling.simulationClass=simulation.transactions.rampConstant.TransferAssetSimulation -DtargetURL=  -DremoteLogin=  -DremotePassword= -DstartLevelUsers=0 -DendLevelUsers=234 -DrampDuring=4500 -DstageDuration=86400 -DmaxDuration=86401

Actual result

out of memory exeption

iroha2 logs open search

kubernetes logs

OOM

Resources utilization

performance metrics

Expected result

The load is applied evenly throughout the entire test. There is no CPU or memory utilization.

Logs

Mar 20 01:23:31 ip-10-1-124-86 containerd: time="2025-03-20T01:23:31.467644696Z" level=info msg="TaskOOM event container_id:\"c59c9b3d5c12272d6c37de5d0d068ddb936b74a48e396ffab002bbeffd0a98a0\""

Who can help to reproduce?

@timofeevmd @RamilMus

Notes

No response

Mar 21 '25 12:03 timofeevmd

The issue is that Iroha consumes ~6GB of memory after 20 million transactions.

This matches current implementation (https://github.com/hyperledger-iroha/iroha/issues/5083#issuecomment-2379804636).

80%+ of memory consumes State::transactions which contains hashes of transactions mapped onto block height where they are stored (basically Map<Hash, usize>). State::transactions is a multi-version map with transactional behaviour, currently we use mv crate. Potentially memory usage can be improved if we use some specialized implementation for transactions map. Here is comparison of memory usage for various Map<Hash, usize> implementations:

Map	Potential memory usage, bytes per transaction
`mv::Storage`	270
`mv::Storage` with HashMap	286
`rpds::RedBlackTreeMapSync`	112
`rpds::HashTrieMapSync`	168
`dashmap::DashMap`	69
`chashmap::CHashMap`	88
`concurrent_map::ConcurrentMap`	64
`std::collections::BTreeMap`	64
`std::collections::HashMap`	69

Only mv maps can be used directly for our needs. Maps from rpds crate will give about 2x memory improvement and can relatively easily be adapted for our use case (since they provide persistent behaviour). Other maps potentially could give ~3x memory improvement, but require custom implementation of multi-version and transactional logic.

So the plan is to implement custom solution based on some concurrent map with low memory usage (I think dashmap::DashMap is good choice), and in case it is not possible implement simplier solution using rpds.

Apr 14 '25 14:04 dima74