urbit icon indicating copy to clipboard operation
urbit copied to clipboard

upgraded to v3 on raspi and now no longer able to boot because of "out of memory" error

Open Jaxo opened this issue 1 year ago • 9 comments

Any ideas about this problem? A next worked but then trying to run it ended with this "Out of memory" error on lmdb, as well as other errors, e.g., "boot: core limit: Invalid argument."

$ uname -a Linux umbrel 6.6.20+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.20-1+rpt1 (2024-03-07) aarch64 GNU/Linux $ sudo ./urbit tindus-milmus ~ urbit 3.0 boot: home is /mnt/data/urbit/tindus-milmus disk: loaded epoch 0i65645879 boot: core limit: Invalid argument loom: mapped 2048MB boot: protected loom live: mapped: MB/854.687.744 live: loaded: KB/16.384 loom: memoization migration running... loom: memoization migration done boot: installed 967 jets boot: core limit: Invalid argument loom: mapped 2048MB lite: arvo formula 2a2274c9 lite: core 4bb376f0 lite: final state 4bb376f0 lmdb: failed to open event log: Out of memory disk: failed to initialize lmdb pier: stay: init fail

Jaxo avatar Mar 24 '24 20:03 Jaxo

You're hitting limits on the number of memory mappings per process, see https://github.com/urbit/vere/issues/624.

joemfb avatar Mar 24 '24 22:03 joemfb

I had found that and tried it, @joemfb, but it didn't work. I'm wondering if the earlier error (boot: core limit: Invalid argument) is relevant? --joe

$ cat /proc/sys/vm/max_map_count
262144
$ sudo tindus-milmus/.run
~
urbit 3.0
boot: home is /mnt/data/urbit/tindus-milmus
disk: loaded epoch 0i65645879
boot: core limit: Invalid argument              <<<<<<<<<<
loom: mapped 2048MB
boot: protected loom
live: mapped: MB/854.687.744
live: loaded: KB/16.384
loom: memoization migration running...
loom: memoization migration done
boot: installed 967 jets
boot: core limit: Invalid argument
loom: mapped 2048MB
lite: arvo formula 2a2274c9
lite: core 4bb376f0
lite: final state 4bb376f0
lmdb: failed to open event log: Out of memory
disk: failed to initialize lmdb
pier: stay: init fail

Jaxo avatar Mar 25 '24 17:03 Jaxo

The core limit failure is from a setrlimit() call, trying to set RLIMIT_CORE to RLIM_INFINITY (enabling arbitrary-sized core dumps) -- just for debugging and unrelated to this.

But I just realized that you said "raspi", and this specific error is coming out the lmdb integration. We're trying to create a huge mapping for the lmdb event-log environment: 500 GB on linux-aarch64, 1 TB elsewhere. That limit is currently hardcoded. We should make it configurable from the command line, in the meantime, you could try lowering it (at the top of pkg/vere/disk.c) and rebuilding. I should've read more closely the first time!

joemfb avatar Mar 25 '24 18:03 joemfb

ah, ok, thanks! perhaps it's time to move my urbit elsewhere...

Jaxo avatar Mar 25 '24 19:03 Jaxo

I also tried your rlimit suggestion, but that didn't seem to resolve the other (unrelated) error message:

$ prlimit
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                  unlimited  unlimited bytes
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited bytes
LOCKS      max number of file locks held       unlimited  unlimited locks
MEMLOCK    max locked-in-memory address space 1023676416 1023676416 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024    1048576 files
NPROC      max number of processes                 29173      29173 processes
RSS        max resident set size               unlimited  unlimited bytes
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals           29173      29173 signals
STACK      max stack size                        8388608  unlimited bytes

$ sudo tindus-milmus/.run
~
urbit 3.0
boot: home is /mnt/data/urbit/tindus-milmus
disk: loaded epoch 0i65645879
boot: core limit: Invalid argument
loom: mapped 2048MB
boot: protected loom
live: mapped: MB/854.687.744
live: loaded: KB/16.384
loom: memoization migration running...
loom: memoization migration done
boot: installed 967 jets
boot: core limit: Invalid argument
loom: mapped 2048MB
lite: arvo formula 2a2274c9
lite: core 4bb376f0
lite: final state 4bb376f0
lmdb: failed to open event log: Out of memory
disk: failed to initialize lmdb
pier: stay: init fail

Jaxo avatar Mar 25 '24 20:03 Jaxo

yep, this is exactly the same issue I am running into on a Raspi 5 (8GB). It has a 250GB disk on it, and the lmdb memory mapping of the 1/2 TB log causes an ENOMEM. strace shows mmap(NULL, 536870912000, PROT_READ, MAP_SHARED, 19, 0) causes the crash. This is a Debian 12 (bookworm) Linux five 6.6.28+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.28-1+rpt1 (2024-04-22) aarch64 GNU/Linux and acts the same whether running as a bog-standard OS process in the shell or inside a docker container.

the lmdb version in liblmdb0/now 0.9.24-1 arm64 [installed,local]

./urbit ravdem-mintel/
~
urbit 3.0
boot: home is /home/kristofer/ships/ravdem-mintel
disk: loaded epoch 0i10616356
loom: mapped 2048MB
boot: protected loom
live: mapped: MB/516.702.208
live: loaded: KB/16.384
boot: installed 967 jets
loom: mapped 2048MB
lite: arvo formula 2a2274c9
lite: core 4bb376f0
lite: final state 4bb376f0
lmdb: failed to open event log: Out of memory
disk: failed to initialize lmdb
pier: stay: init fail

and also

$ prlimit
RESOURCE   DESCRIPTION                              SOFT       HARD UNITS
AS         address space limit                 unlimited  unlimited bytes
CORE       max core file size                          0  unlimited bytes
CPU        CPU time                            unlimited  unlimited seconds
DATA       max data size                       unlimited  unlimited bytes
FSIZE      max file size                       unlimited  unlimited bytes
LOCKS      max number of file locks held       unlimited  unlimited locks
MEMLOCK    max locked-in-memory address space 1041387520 1041387520 bytes
MSGQUEUE   max bytes in POSIX mqueues             819200     819200 bytes
NICE       max nice prio allowed to raise              0          0 
NOFILE     max number of open files                 1024    1048576 files
NPROC      max number of processes                 30414      30414 processes
RSS        max resident set size               unlimited  unlimited bytes
RTPRIO     max real-time priority                      0          0 
RTTIME     timeout for real-time tasks         unlimited  unlimited microsecs
SIGPENDING max number of pending signals           30414      30414 signals
STACK      max stack size                        8388608  unlimited bytes

Would this imply, if I put a 1TB disk on it, it might boot?

and thanks, ~mopfel-winrux, for your assistance.

kristofer avatar May 22 '24 06:05 kristofer

Also having this issue. Was able to edit and launch v1.9 as described here: https://github.com/urbit/urbit/issues/6026 but am unable to find the equivalent lines of code for the current version. Was a flag ever added as discussed above? If not, does anyone know what changes I would need to make to the current source to build a functional version?

dovpub-fodryp avatar Aug 18 '24 20:08 dovpub-fodryp

For posterity, the flag --lmdb-map-size was added to vere in https://github.com/urbit/vere/pull/655. This allows anybody encountering this issue to reduce the lmdb map size to resolve the issue.

pkova avatar Aug 19 '24 09:08 pkova

thanks! i just tried the latest and it's now working again on my raspi...and without having to use the flag, so perhaps something else related was changed. in any case, 🙇 🙏 --joe

Jaxo avatar Aug 19 '24 19:08 Jaxo