`bee db repair-reserve` dies with: "Error: repair: index counts do not match"
Context
bee 2.1.0 running on guix.
Summary
# sudo -u bee-mainnet bash
bash-5.1$ /gnu/store/l9h3qk066vii8qsnc5d0iyp8hw831cfc-bee-binary-2.1.0/bin/bee db repair-reserve --data-dir=/var/lib/swarm/mainnet/bee-0
"time"="2024-05-30 09:34:55.146235" "level"="warning" "logger"="node" "msg"="Repair will recreate the reserve entries based on the chunk availability in the chunkstore. The epoch time and bin IDs will be reset."
"time"="2024-05-30 09:34:55.146300" "level"="warning" "logger"="node" "msg"="The pullsync peer sync intervals are reset so on the next run, the node will perform historical syncing."
"time"="2024-05-30 09:34:55.146307" "level"="warning" "logger"="node" "msg"="This is a destructive process. If the process is stopped for any reason, the reserve may become corrupted."
"time"="2024-05-30 09:34:55.146312" "level"="warning" "logger"="node" "msg"="To prevent permanent loss of data, data should be backed up before running the cmd."
"time"="2024-05-30 09:34:55.146317" "level"="warning" "logger"="node" "msg"="You have another 10 seconds to change your mind and kill this process with CTRL-C..."
"time"="2024-05-30 09:35:05.146731" "level"="warning" "logger"="node" "msg"="proceeding with repair..."
"time"="2024-05-30 09:35:10.532454" "level"="info" "logger"="node" "msg"="starting reserve repair tool, do not interrupt or kill the process..."
"time"="2024-05-30 09:35:13.147589" "level"="error" "logger"="node" "msg"="check failed" "error"="iterate callback function errored: binID 3494844 in bin 10 already used"
"time"="2024-05-30 09:35:13.147749" "level"="info" "logger"="node" "msg"="removed all bin index entries"
"time"="2024-05-30 09:35:25.285079" "level"="info" "logger"="node" "msg"="removed all chunk bin items" "total_entries"=4167126
"time"="2024-05-30 09:35:29.311385" "level"="info" "logger"="node" "msg"="counted all batch radius entries" "total_entries"=4167129
Error: repair: index counts do not match
bash-5.1$
Same happened to me, I've executed it a second time and it worked.
can you send us a copy of the localstore directory right after the cmd fails?
It wouldn't be easy. I have backups of nodes, hundreds of GB on storage each node, but not each node gave me the problem. I would have to recover them until I found one that repairing has the problem.
i also don't have data anymore.
when the conversion printed an error, then i rerun it, and the second time it always finished without printing an error.
The steps suggested for this issue are the following:
- Try a second time to run the command, which most of the time works successfully.
- If the above does not work, the node should be nuked and due to the redundancy the node will be able to retrieve its content.
this isn't relevant anymore.