bee icon indicating copy to clipboard operation
bee copied to clipboard

`bee db repair-reserve` dies with: "Error: repair: index counts do not match"

Open attila-lendvai opened this issue 1 year ago • 5 comments

Context

bee 2.1.0 running on guix.

Summary

# sudo -u bee-mainnet bash  
bash-5.1$ /gnu/store/l9h3qk066vii8qsnc5d0iyp8hw831cfc-bee-binary-2.1.0/bin/bee db repair-reserve --data-dir=/var/lib/swarm/mainnet/bee-0
"time"="2024-05-30 09:34:55.146235" "level"="warning" "logger"="node" "msg"="Repair will recreate the reserve entries based on the chunk availability in the chunkstore. The epoch time and bin IDs will be reset."
"time"="2024-05-30 09:34:55.146300" "level"="warning" "logger"="node" "msg"="The pullsync peer sync intervals are reset so on the next run, the node will perform historical syncing."
"time"="2024-05-30 09:34:55.146307" "level"="warning" "logger"="node" "msg"="This is a destructive process. If the process is stopped for any reason, the reserve may become corrupted."
"time"="2024-05-30 09:34:55.146312" "level"="warning" "logger"="node" "msg"="To prevent permanent loss of data, data should be backed up before running the cmd."
"time"="2024-05-30 09:34:55.146317" "level"="warning" "logger"="node" "msg"="You have another 10 seconds to change your mind and kill this process with CTRL-C..."
"time"="2024-05-30 09:35:05.146731" "level"="warning" "logger"="node" "msg"="proceeding with repair..."
"time"="2024-05-30 09:35:10.532454" "level"="info" "logger"="node" "msg"="starting reserve repair tool, do not interrupt or kill the process..."
"time"="2024-05-30 09:35:13.147589" "level"="error" "logger"="node" "msg"="check failed" "error"="iterate callback function errored: binID 3494844 in bin 10 already used"
"time"="2024-05-30 09:35:13.147749" "level"="info" "logger"="node" "msg"="removed all bin index entries"
"time"="2024-05-30 09:35:25.285079" "level"="info" "logger"="node" "msg"="removed all chunk bin items" "total_entries"=4167126
"time"="2024-05-30 09:35:29.311385" "level"="info" "logger"="node" "msg"="counted all batch radius entries" "total_entries"=4167129
Error: repair: index counts do not match
bash-5.1$

attila-lendvai avatar May 30 '24 08:05 attila-lendvai

Same happened to me, I've executed it a second time and it worked.

tmm360 avatar May 31 '24 15:05 tmm360

can you send us a copy of the localstore directory right after the cmd fails?

istae avatar Jun 01 '24 17:06 istae

It wouldn't be easy. I have backups of nodes, hundreds of GB on storage each node, but not each node gave me the problem. I would have to recover them until I found one that repairing has the problem.

tmm360 avatar Jun 02 '24 00:06 tmm360

i also don't have data anymore.

when the conversion printed an error, then i rerun it, and the second time it always finished without printing an error.

attila-lendvai avatar Jun 04 '24 17:06 attila-lendvai

The steps suggested for this issue are the following:

  • Try a second time to run the command, which most of the time works successfully.
  • If the above does not work, the node should be nuked and due to the redundancy the node will be able to retrieve its content.

nikipapadatou avatar Jun 10 '24 10:06 nikipapadatou

this isn't relevant anymore.

attila-lendvai avatar Sep 15 '24 14:09 attila-lendvai