vere
vere copied to clipboard
`ames: bogus lane` + crashing on assert in `_ce_image_blit`
Observed on Linux AArch64 (Arch Linux ARM 5.19.8-1-aarch64-ARCH
)
vere
version: release 1.20 (also occurs on 1.18)
My ship is bailing shortly after boot, typically 1-3 minutes after starting.
I've inspected its output over multiple runs and it has printed ames: bogus lane
prior to aborting every time. Because of this I cranked up ames logging a bit with %odd
%for
%rot
%rcv
%msg
.
It doesn't appear to instantly crash upon any specific ames activity, but always a few seconds after the last debug print (usually a hear
, sometimes a trying route
, with similar output before that). I've snipped most of it in the log below for brevity:
$ ./urbit -t -p 54321 -b 127.0.0.1 --http-port 8080 --loom 32 nimpun-doller
~
urbit 1.20-936f24b394
boot: home is /home/nimpun-doller/nimpun-doller
loom: mapped 2048MB
lite: arvo formula 2a2274c9
lite: core 4bb376f0
lite: final state 4bb376f0
loom: mapped 4096MB
boot: protected loom
live: loaded: GB/2.157.395.968
boot: installed 652 jets
vere: checking version compatibility
%unlagging
ames: live on 54321
conn: listening on /home/nimpun-doller/nimpun-doller/.urb/conn.sock
http: web interface live on http://localhost:8080
http: loopback live on http://localhost:12322
pier (81851936): live
ames: ~wanzod: send-blob: to ~wanzod
ames: ~wanzod: trying route: ~wanzod
[...]
ames: ~disden-talhes: received packet
ames: ~disden-talhes: hear last in-progress [~disden-talhes seq=1 bone=5 fragment-num=0 num-fragments=1 la=0 lh=1]
[approx. 3 seconds later]
ames: bogus lane
lan: 0
home: bailing out
Aborted (core dumped)
GDB backtrace with debug build via bazel build --config=dbg :urbit
from tag vere-v1.20
(1.20-936f24b394
):
Core was generated by `./urbit serf /home/nimpun-doller/nimpun-doller 0:0:0:0 128 50000 31 0'.
Program terminated with signal SIGABRT, Aborted.
#0 0x000000000070ef98 in __restore_sigs ()
(gdb) bt
#0 0x000000000070ef98 in __restore_sigs ()
#1 0x000000000070eff0 in raise ()
#2 0x0000000000706358 in abort ()
#3 0x00000000007063f0 in __assert_fail ()
#4 0x0000000000440ae4 in _ce_image_blit (img_u=0xaf2e60 <u3e_Pool+32784>, ptr_w=0x280000000, stp_ws=4096) at pkg/noun/events.c:864
#5 0x000000000044153c in u3e_live (nuu_o=1 '\001', dir_c=0xffffddcafe6e "/home/nimpun-doller/nimpun-doller") at pkg/noun/events.c:1174
#6 0x000000000044bd44 in u3m_boot (dir_c=0xffffddcafe6e "/home/nimpun-doller/nimpun-doller", len_i=2147483648) at pkg/noun/manage.c:1857
#7 0x0000000000403a00 in _cw_serf_commence (argc=8, argv=0xffffddcaed58) at pkg/vere/main.c:1044
#8 0x00000000004067f8 in _cw_utils (argc=8, argv=0xffffddcaed58) at pkg/vere/main.c:2049
#9 0x0000000000406874 in main (argc=8, argv=0xffffddcaed58) at pkg/vere/main.c:2070
+vat %base
for completeness:
%base
/sys/kelvin: [%zuse 415]
base hash: 0v1j.vgj7i.m56f2.gmo7j.rimse.dtn6n.q63dm.ra989.mnplb.olc74.tin8p
%cz hash: 0v1j.vgj7i.m56f2.gmo7j.rimse.dtn6n.q63dm.ra989.mnplb.olc74.tin8p
app status: running
force on: ~
force off: ~
publishing ship: ~
updates: remote
source ship: ~wanzod
source desk: %kids
source aeon: 132
kids desk: %kids
pending updates: ~
Update: After a meld
, things seem to be working normally. I had last melded when upgrading to 1.18, so seeing it free this much is a bit surprising to me. Not sure if this behavior without giving --loom
to meld
is expected:
$ ./urbit meld nimpun-doller # note: no --loom
loom: mapped 2048MB
loom: image (north) blit read: Bad address
Assertion '0' failed in pkg/noun/events.c:864
Assertion failed: 0 (pkg/noun/events.c: _ce_image_blit: 864)
Aborted (core dumped)
$ ./urbit meld --loom 32 nimpun-doller
loom: mapped 4096MB
boot: protected loom
live: loaded: GB/2.157.395.968
boot: installed 652 jets
hash-cons arena:
root: B/160
atoms (2309247):
refs: MB/70.491.560
data: MB/315.390.063
dict: MB/174.264.720
total: MB/560.146.343
cells (32856917):
refs: MB/781.763.380
dict: GB/3.127.053.520
total: GB/3.908.816.900
total: GB/4.468.963.403
urbit: meld: gained: GB/1.002.031.680
$ ./urbit info nimpun-doller
loom: mapped 4096MB
boot: protected loom
live: loaded: GB/1.155.366.912
boot: installed 652 jets
urbit: nimpun-doller at event 81853190
disk: live=&, event=81853754
lmdb info:
map size: 536870912000
page size: 4096
max pages: 131072000
number of pages used: 4200301
last transaction ID: 81139792
max readers: 127
number of readers used: 0
file size (page): 17204432896
file size (stat): 17204432896
I'm no longer crashing after boot and things feel stable, so my issue seems to be resolved. I'll keep the core dump around in case there's anything to inspect that'd be useful for tracking this down, otherwise feel free to close this.
This is interesting, thanks for reporting. @belisarius222 @barter-simsum what could cause so much meldable memory usage in so little time? What could we do to figure out what's causing that? Would probably need |mass output.