erigon icon indicating copy to clipboard operation
erigon copied to clipboard

Erigon start fails with "mdbx_env_open: cannot allocate memory" error

Open diglos opened this issue 3 years ago • 4 comments
trafficstars

System information

Erigon version: 2022.07.2-alpha-c7a94eee

OS & Version: Linux ARM64 (Odroid and Rock 5B boards)

Commit hash : c7a94eee

Expected behaviour

Client start

Actual behaviour

It fails trying to open the database

Steps to reproduce the behaviour

Start the client on an Odroid M1 or Rock 5B boards

Backtrace

INFO[07-31|08:23:08.290] Build info                               git_branch=HEAD git_tag=v2022.07.02-dirty git_commit=c7a94eeea05d7c0d569c811399642a7d108d8c82
INFO[07-31|08:23:08.291] Starting Erigon on Ethereum mainnet... 
INFO[07-31|08:23:08.296] Maximum peer count                       ETH=100 total=100
INFO[07-31|08:23:08.296] torrent verbosity                        level=2
INFO[07-31|08:23:10.402] Set global gas cap                       cap=50000000
INFO[07-31|08:23:10.647] Opening Database                         label=chaindata path=/home/ethereum/.local/share/erigon/chaindata
EROR[07-31|08:23:10.655] Erigon startup                           err="mdbx_env_open: cannot allocate memory, label: chaindata, trace: [kv_mdbx.go:245 node.go:323 node.go:326 backend.go:156 node.go:121 node.go:112 main.go:41 app.go:526 app.go:286 main.go:28 proc.go:250 asm_arm64.s:1259]"

Other data.

This happens starting the client in both devices, Odroid M1 and Rock 5B. Both clients running Ubuntu 20.04.4 (tried on Debian 11 as well with Rock 5B). It works on Raspberry Pi 4 with the same OS though (this is the odd thing).

The same happens with Akula, in both devices (and works fine on the Raspberry Pi 4).

2022-07-31T08:07:24.783752Z  INFO Starting Akula (akula/v0.1.0-master-8d00b15-2022-07-29/aarch64-unknown-linux-gnu/rustc1.63.0-nightly)
Error: failed to open database at /home/ethereum/.local/share/akula/chaindata

Caused by:
    Cannot allocate memory

Seems a MDBX issue. Let me know how can I give you more feedback if needed.

Thanks.

diglos avatar Jul 31 '22 08:07 diglos

show “uname -a” from this OS’s And “free -h”

AskAlexSharov avatar Jul 31 '22 09:07 AskAlexSharov

Hi,

Rockchip 5B Debian 11

uname -a

Linux rock-5b 5.10.66-3-rockchip-g62b1ff5028c4 #rockchip SMP Mon May 16 22:17:40 CST 2022 aarch64 GNU/Linux

free -h

               total        used        free      shared  buff/cache   available
Mem:            15Gi       211Mi        14Gi       6.0Mi       483Mi        14Gi
Swap:             0B          0B          0B

Odroid M1 Ubuntu Server 20.04.4

uname -a

Linux ethereumonarm-odroidm1-1a40354b6 4.19.219-odroid-arm64 #1 SMP Fri, 27 May 2022 03:16:46 +0000 aarch64 aarch64 aarch64 GNU/Linux

free -h (it is currently running a Besu sync)

              total        used        free      shared  buff/cache   available
Mem:          7.3Gi       3.5Gi        43Mi       1.0Mi       3.7Gi       3.7Gi
Swap:         9.5Gi        88Mi       9.4Gi

diglos avatar Jul 31 '22 17:07 diglos

What is your overcommit? https://stackoverflow.com/questions/13127855/what-is-the-size-limit-for-mmap#18360946

AskAlexSharov avatar Aug 01 '22 01:08 AskAlexSharov

Hi, it shows 0, and this is the value for all three devices (Raspberry works). I changed it to 1 but I get the same error.

Other interesting OS limits (Odroid):

ulimits -a

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 29759
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 29759
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

They are pretty much the same that the Rpi ones:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 30858
max locked memory       (kbytes, -l) 65536
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 30858
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

diglos avatar Aug 01 '22 08:08 diglos

Hi, guys.

As this works on the Raspberry Pi it may be an issue with the Rockchip kernel (both Odroid and Rock 5B use this chip). Is there any way I can trace the error and get back to you or the Rockchip team?

Thanks in advance.

diglos avatar Aug 27 '22 11:08 diglos

Hi, I run into the same issue as you. After some hours of code studying, it seems to be a problem of the kernel config. Especially the options for the memory addressing are different on Odroid/Rockchip and Raspberry.

CONFIG_ARM64_VA_BITS=39 on Rockchip CONFIG_ARM64_VA_BITS=48 on Raspberry

with 39bits it is only possible to address 512Gb, which seems to be the problem because erigon wants to address 8TB for the chain database.

https://stackoverflow.com/a/4924789

I will try to recompile the kernel and keep you in touch.

UPDATE:

After compiling a new kernel 4.19.219 from https://github.com/hardkernel/linux odroidm1-4.19.y with Pagesize 64kb and Virtual Address Bits 48 erigon starts as expected.

Marjgus avatar Aug 27 '22 21:08 Marjgus