grin icon indicating copy to clipboard operation
grin copied to clipboard

Failed to find one of the right cookies. Core dumped

Open hashmap opened this issue 5 years ago • 16 comments

Grin was running when my server was rebooted. After that any attempt to start grin failed with the message above. Removal of chain_data helped.

hashmap avatar Dec 09 '18 21:12 hashmap

I found it's croaring https://github.com/RoaringBitmap/CRoaring/blob/master/src/roaring_array.c#L767

hashmap avatar Dec 11 '18 14:12 hashmap

We corrupted one of the pmmr_leaf.bin or pmmr_prun.bin files somehow? We were writing to one of those when the server rebooted?

Oof.

antiochp avatar Dec 11 '18 14:12 antiochp

was this before or after antioch's fix for safely writing files before stopping grin?

sesam avatar Dec 11 '18 15:12 sesam

My fix won't help with a server reboot, only on "clean" shutdown via the grin node itself.

antiochp avatar Dec 11 '18 16:12 antiochp

ok. is core dumping risking spilling secrets to disk? should we catch this, show a warning, then either shut-down gracefully, or somehow retry?

sesam avatar Dec 15 '18 21:12 sesam

Saw another report in gitter. I predict it may be a problem when we get enough nodes.

hashmap avatar Dec 16 '18 19:12 hashmap

Good prediction. One issue I think is that on other storages (LMDB, MMRs), we have a way to get back to a previous snapshot (the chain head) so if a write didn't really work out we can easily find a workable checkpoint. It doesn't seem as easy with croaring but might not be too hard to add?

ignopeverell avatar Jan 24 '19 23:01 ignopeverell

Just so there is no confusion here - there is zero persistence in the croaring library, all of this is on us, we literally just write the bytes to a file. I believe we use a temp file to make things reasonably atomic but we have not put a lot of thought into doing this really robustly.

antiochp avatar Jan 25 '19 09:01 antiochp

Memory-mapped file could help, unfortunately it's not yet supported by croaring (but supported by java and go versions) https://github.com/RoaringBitmap/CRoaring/issues/74

hashmap avatar Jan 31 '19 20:01 hashmap

Was there ever a resolution to this? I just ran an Nvidia graphics driver update on my PC. Somewhere during the update, it crashed my Virtual Box machine, which was running my node. Now, I am unable to restart my node..

Is there a workaround to get this back up and running, or should I destroy this machine, make a new one, and import the old wallet?

TheJimmyH avatar Feb 04 '19 18:02 TheJimmyH

@Jimmy24651 sure, the workaround is in the issue text, rm -rf ~/.grin/main/chain_data (replace main with floo for floonet)

hashmap avatar Feb 04 '19 22:02 hashmap

@hashmap is this an issue anymore?

0xmichalis avatar May 17 '19 19:05 0xmichalis

@kargakis I think it's still an issue, a server could be stopped abruptly by power outage or the process be killed by kill -9 etc

hashmap avatar May 17 '19 20:05 hashmap

Issue still exists on grin 2.0.0

niahmiah avatar Jul 16 '19 19:07 niahmiah

I have a node installed on my mining rig because of solo mining and after every third power outage I have to do rm -rf ~/.grin/main/chain_data and download the whole blockchain again. Very annoying bug.

madmarks avatar Sep 08 '19 15:09 madmarks

Issue still exists on grin 3.0. The power went out on my computer and I recieved the following error when I tried to restart grin 'I failed to find one of the right cookies. Found 3497651248 Segmentation fault (core dumped)' Is there a fix for this? I tried 'rm -rf ~/.grin/main/chain_data' and i still get the same error.

BLOCKCHAINSMOKER avatar Jan 20 '20 07:01 BLOCKCHAINSMOKER