Bug(manual_compact):replica lose manual compact finished status after replica migrate
Bug Report
Please answer these questions before submitting your issue. Thanks!
- What did you do? -Set a suitable time and let a table finish manual compact -View the progress of manual compaction,now is 100 -Stop a node -View the progress of manual compaction again ,now is 79 -Go to the alive nodes and view their logs,some replicas begin manual compact -View the progress of manual compaction again ,now is 91 -However some replica DO NOT manual compaction this day manual compaction progress stack in 91
So that , in our online envs,some pegasus users can not finish manual compact after replica migrate.
-
What did you expect to see? Replicas which finished manual compact can hold the status after replica migrate.
-
What version of Pegasus are you using? pegasus2.4
Is the issue still open? I'm interested to solve this issue. Could please tell some more details. Thanks in Advance
@ninsmiracle Each replica of a partition will do manual compaction independently, when one node shutdown, the other replicas on other nodes may not start or in progress manual compaction. What tools you were using to check the progress, could you leave some details? Thanks!
I used admin-cli to check the manaul compaction progress. And this is the result I use manual-compaction query -a TABLE_NAME commond after manual compact begin or node stop:
And we can use pegasus http interface to check the replicas compact detail of target table.
curl YOUR_REPLICA_SERVER_IP:PORT/replica/manual_compaction?app_id=YOUR_APP_ID
We can observe that some replica keep idle status for a long time until it triggered the periodic manual compact time next day.
Is the issue still open? I'm interested to solve this issue. Could please tell some more details. Thanks in Advance
This is my way to recurrent this bug:
- Create a Pegasus table and input some data.
- Set the parameters of this Pegasus app by using
set_app_envs. To achieve our target, we should setmanual_compact.disabledtofalseand setmanual_compact.periodic.trigger_timeto a specific time, such as10:00.(when I do this job,it's 9:58.So that I can observe the result soon) - Build the admin-cli tool and use it to connect to our table.
- Use the
manual-compaction query -acommand to check the progress. We can see that it has reached 100. - Stop one node of our cluster.
- The progress will not be 100 anymore until the next day's
10:00.