bee `bee db repair-reserve` dies with: "Error: repair: index counts do not match"

Context

bee 2.1.0 running on guix.

Summary

# sudo -u bee-mainnet bash  
bash-5.1$ /gnu/store/l9h3qk066vii8qsnc5d0iyp8hw831cfc-bee-binary-2.1.0/bin/bee db repair-reserve --data-dir=/var/lib/swarm/mainnet/bee-0
"time"="2024-05-30 09:34:55.146235" "level"="warning" "logger"="node" "msg"="Repair will recreate the reserve entries based on the chunk availability in the chunkstore. The epoch time and bin IDs will be reset."
"time"="2024-05-30 09:34:55.146300" "level"="warning" "logger"="node" "msg"="The pullsync peer sync intervals are reset so on the next run, the node will perform historical syncing."
"time"="2024-05-30 09:34:55.146307" "level"="warning" "logger"="node" "msg"="This is a destructive process. If the process is stopped for any reason, the reserve may become corrupted."
"time"="2024-05-30 09:34:55.146312" "level"="warning" "logger"="node" "msg"="To prevent permanent loss of data, data should be backed up before running the cmd."
"time"="2024-05-30 09:34:55.146317" "level"="warning" "logger"="node" "msg"="You have another 10 seconds to change your mind and kill this process with CTRL-C..."
"time"="2024-05-30 09:35:05.146731" "level"="warning" "logger"="node" "msg"="proceeding with repair..."
"time"="2024-05-30 09:35:10.532454" "level"="info" "logger"="node" "msg"="starting reserve repair tool, do not interrupt or kill the process..."
"time"="2024-05-30 09:35:13.147589" "level"="error" "logger"="node" "msg"="check failed" "error"="iterate callback function errored: binID 3494844 in bin 10 already used"
"time"="2024-05-30 09:35:13.147749" "level"="info" "logger"="node" "msg"="removed all bin index entries"
"time"="2024-05-30 09:35:25.285079" "level"="info" "logger"="node" "msg"="removed all chunk bin items" "total_entries"=4167126
"time"="2024-05-30 09:35:29.311385" "level"="info" "logger"="node" "msg"="counted all batch radius entries" "total_entries"=4167129
Error: repair: index counts do not match
bash-5.1$

May 30 '24 08:05 attila-lendvai

Same happened to me, I've executed it a second time and it worked.

May 31 '24 15:05 tmm360

can you send us a copy of the localstore directory right after the cmd fails?

Jun 01 '24 17:06 istae

It wouldn't be easy. I have backups of nodes, hundreds of GB on storage each node, but not each node gave me the problem. I would have to recover them until I found one that repairing has the problem.

Jun 02 '24 00:06 tmm360

i also don't have data anymore.

when the conversion printed an error, then i rerun it, and the second time it always finished without printing an error.

Jun 04 '24 17:06 attila-lendvai

The steps suggested for this issue are the following:

Try a second time to run the command, which most of the time works successfully.
If the above does not work, the node should be nuked and due to the redundancy the node will be able to retrieve its content.

Jun 10 '24 10:06 nikipapadatou

this isn't relevant anymore.

Sep 15 '24 14:09 attila-lendvai