prysm icon indicating copy to clipboard operation
prysm copied to clipboard

validator goes OOM when adding multiple keys via WEB interface

Open okorolov opened this issue 2 years ago • 7 comments

🐞 Bug Report

Description

Prysm Validator goes OOM when adding multiple keys via Web interface.

🔬 Minimal Reproduction

Start a validator running on any type of instance with < 32GB RAM. Use WEB UI to load 100 validator keys.

Application / POD will be killed due to OOM event.

🔥 Error

POD fails with OOM event.

🌍 Your Environment

  • EKS (kubernetes)
  • AL2_x86_64 instance type for nodes
  • any instance type < 32GB RAM.

What version of Prysm are you running? (Which release)

3.1.1

Anything else relevant (validator index / public key)?

There seems to be a similar issue that was raised couple of years ago. https://github.com/prysmaticlabs/prysm/issues/5830

It seems that during keys addition process WEB UI will spawn multiple validate processes on the backend in parallel. image

This results in significant spikes on the validator side.

Example: adding 20 keys: (5+ GB spike) image Example: adding 100 keys (25+ GB spike) image

It is worth mentioning that after keys addition POD RAM consumption returns to normal values ~100-500MB RAM after 2-3 minutes.

Suggested Fix

Is it possible to validate 1 key at a time (not doing it in parallel) freeing up memory after each validation? It will take more time that will not result in such huge memory spikes.

okorolov avatar Oct 05 '22 10:10 okorolov

@james-prysm Any ideas on this ?

nisdas avatar Oct 05 '22 23:10 nisdas

I'll take a look there are some inefficient processes due to api limitations but haven't investigated this.

james-prysm avatar Oct 05 '22 23:10 james-prysm

@okorolov as I begin this investigation could you let me know when you experienced this on the UI? was this during wallet creation or adding additional on the dashboard.

james-prysm avatar Oct 06 '22 22:10 james-prysm

@okorolov as I begin this investigation could you let me know when you experienced this on the UI? was this during wallet creation or adding additional on the dashboard.

This happens during wallet creation process. I will additionally verify if the situation is any different on the already created wallet. Will update. Thanks.

okorolov avatar Oct 06 '22 22:10 okorolov

@james-prysm the situation is the same with already created wallet. Tried importing 10 keys on a small validator instance (2GB RAM) and validator pod failed in 5 seconds.

LAST SEEN TYPE REASON OBJECT MESSAGE 4m13s Warning SystemOOM node/ip-10-10-2-60.us-east-2.compute.internal System OOM encountered, victim process: validator, pid: 29987

image

As well I would like to add that this validation process in UI is a bit confusing:

  1. When you type the keystore password - the prysm client will try to decrypt the keys right away as you type (you don't really see this except through the WEB console). If your typing is not fast enough you will get failed validations before you actually finished typing.
  2. When the validation process kicks in and you try to press "continue" - there will be no effect until the validation process finishes. This might be especially confusing since validation process for 100 keys can take up to 30+ seconds. and during this time the WEB UI will not react on "continue" button.

Any indication for the validation process and greyed-out continue button might be a good solution for this process.

Thanks.

okorolov avatar Oct 07 '22 05:10 okorolov

Thanks for checking, I'll try to take a look at this soon as I am now back from devcon

james-prysm avatar Oct 17 '22 16:10 james-prysm

adding #237 in the next release, this makes it at the very least 1 validation request instead of 1 for each when the password is the same. hopefully this will solve most usecases. the button staying disabled was another thing I fixed.

james-prysm avatar Oct 20 '22 02:10 james-prysm

Item has been released, closing now please request reopen if issue persists in the same way

james-prysm avatar Nov 04 '22 19:11 james-prysm