incubator-pegasus icon indicating copy to clipboard operation
incubator-pegasus copied to clipboard

Bug(manual_compact):replica lose manual compact finished status after replica migrate

Open ninsmiracle opened this issue 2 years ago • 4 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? -Set a suitable time and let a table finish manual compact -View the progress of manual compaction,now is 100 -Stop a node -View the progress of manual compaction again ,now is 79 -Go to the alive nodes and view their logs,some replicas begin manual compact -View the progress of manual compaction again ,now is 91 -However some replica DO NOT manual compaction this day manual compaction progress stack in 91

So that , in our online envs,some pegasus users can not finish manual compact after replica migrate.

  1. What did you expect to see? Replicas which finished manual compact can hold the status after replica migrate.

  2. What version of Pegasus are you using? pegasus2.4

ninsmiracle avatar Oct 31 '23 03:10 ninsmiracle

Is the issue still open? I'm interested to solve this issue. Could please tell some more details. Thanks in Advance

abdraheem98 avatar Nov 15 '23 10:11 abdraheem98

@ninsmiracle Each replica of a partition will do manual compaction independently, when one node shutdown, the other replicas on other nodes may not start or in progress manual compaction. What tools you were using to check the progress, could you leave some details? Thanks!

acelyc111 avatar Nov 17 '23 03:11 acelyc111

I used admin-cli to check the manaul compaction progress. And this is the result I use manual-compaction query -a TABLE_NAME commond after manual compact begin or node stop:

image

And we can use pegasus http interface to check the replicas compact detail of target table. curl YOUR_REPLICA_SERVER_IP:PORT/replica/manual_compaction?app_id=YOUR_APP_ID image

We can observe that some replica keep idle status for a long time until it triggered the periodic manual compact time next day.

ninsmiracle avatar Nov 22 '23 07:11 ninsmiracle

Is the issue still open? I'm interested to solve this issue. Could please tell some more details. Thanks in Advance

This is my way to recurrent this bug:

  1. Create a Pegasus table and input some data.
  2. Set the parameters of this Pegasus app by using set_app_envs. To achieve our target, we should set manual_compact.disabled to false and set manual_compact.periodic.trigger_time to a specific time, such as 10:00.(when I do this job,it's 9:58.So that I can observe the result soon)
  3. Build the admin-cli tool and use it to connect to our table.
  4. Use the manual-compaction query -a command to check the progress. We can see that it has reached 100.
  5. Stop one node of our cluster.
  6. The progress will not be 100 anymore until the next day's 10:00.

ninsmiracle avatar Nov 22 '23 07:11 ninsmiracle