incubator-pegasus icon indicating copy to clipboard operation
incubator-pegasus copied to clipboard

fix(manual_compact): fix replica lose manual compact finished status after replica migrate bug

Open ninsmiracle opened this issue 1 year ago • 6 comments

What problem does this PR solve?

#1665

And cause the misstake operate of PR #1666, I closed it and forced pull it.
So I have to re pull in another pull request.

What is changed and how does it work?

I add another string in meta_store (meta_store is a column families which persist pegasus_last_manual_compact_finish_time and so on). So that we can read the value from meta_store when we recover a replica after replica migrate.

Checklist

Tests
  • Manual test 1.Create table,and put some data in it. 2.Let table begin to do manual compaction 3.Finish manual compaction and view the replica relationship of this table. 4.Stop a node,and check the progress of compaction. 5.Wait a few minutes,and check if still have some replica can not show a Finish status.

Test Result

Pegasus-AdminCli-1.2.0 » manual-compaction query -a gns
Table gns manual compaction progress is 100
Pegasus-AdminCli-1.2.0 » manual-compaction query -a gns
Table gns manual compaction progress is 83
Pegasus-AdminCli-1.2.0 » manual-compaction query -a gns
Table gns manual compaction progress is 83
Pegasus-AdminCli-1.2.0 » manual-compaction query -a gns
Table gns manual compaction progress is 100
Code changes
  • try to read LAST_MANUAL_COMPACT_USED_TIME from meta_store when DB exist.
  • Set LAST_MANUAL_COMPACT_USED_TIME to zero when DB not exist.
  • Set LAST_MANUAL_COMPACT_USED_TIME to the time replica last used.
  • Generate a new checkpoint after we change LAST_MANUAL_COMPACT_USED_TIME in func do_manual_compact

ninsmiracle avatar Mar 22 '24 06:03 ninsmiracle

It's OK to me to remove the periodic manual compaction function, it can be replaced by thirdparty tools, feel free to remove it.

acelyc111 avatar Mar 27 '24 15:03 acelyc111

It's OK to me to remove the periodic manual compaction function, it can be replaced by thirdparty tools, feel free to remove it.

I think it's reasonable to keep this periodic manual compaction function. Cause many user using spark to set manual compaction time now. If we remove this function in new version, we have to change the change user habits. Actully, I think we lack a unified management platform to control every things.

ninsmiracle avatar Apr 02 '24 07:04 ninsmiracle

It's OK to me to remove the periodic manual compaction function, it can be replaced by thirdparty tools, feel free to remove it.

I think it's reasonable to keep this periodic manual compaction function. Cause many user using spark to set manual compaction time now. If we remove this function in new version, we have to change the change user habits. Actully, I think we lack a unified management platform to control every things.

What is "using spark to set manual compaction time" ? Is it a Pegasus binding method? I don't think it's a block to remove the periodic manual compaction. This is not the core tasks of Pegasus, it make the code a bit of smelly.

At least, you need to make this patch to resolve these issues mentioned above.

acelyc111 avatar Apr 08 '24 02:04 acelyc111

It's OK to me to remove the periodic manual compaction function, it can be replaced by thirdparty tools, feel free to remove it.

I think it's reasonable to keep this periodic manual compaction function. Cause many user using spark to set manual compaction time now. If we remove this function in new version, we have to change the change user habits. Actully, I think we lack a unified management platform to control every things.

What is "using spark to set manual compaction time" ? Is it a Pegasus binding method? I don't think it's a block to remove the periodic manual compaction. This is not the core tasks of Pegasus, it make the code a bit of smelly.

At least, you need to make this patch to resolve these issues mentioned above.

In high version of PEGASUS-SPARK , a compaction request will be send to cluster after user call startBulkLoad . And I will update PEGASUS-SPARK this week. I do not think we should remove perodic manual compaction code now . Cause I think all its bugs about manual compaction have been fixed, and they are all minor issues.

ninsmiracle avatar Apr 08 '24 06:04 ninsmiracle

It's OK to me to remove the periodic manual compaction function, it can be replaced by thirdparty tools, feel free to remove it.

I think it's reasonable to keep this periodic manual compaction function. Cause many user using spark to set manual compaction time now. If we remove this function in new version, we have to change the change user habits. Actully, I think we lack a unified management platform to control every things.

What is "using spark to set manual compaction time" ? Is it a Pegasus binding method? I don't think it's a block to remove the periodic manual compaction. This is not the core tasks of Pegasus, it make the code a bit of smelly. At least, you need to make this patch to resolve these issues mentioned above.

In high version of PEGASUS-SPARK , a compaction request will be send to cluster after user call startBulkLoad . And I will update PEGASUS-SPARK this week. I do not think we should remove perodic manual compaction code now . Cause I think all its bugs about manual compaction have been fixed, and they are all minor issues.

Thanks to update the pegasus-spark project! It's not well maintenance for about 4 years.

Do you think it's confusing that the manual compact status depends on last_manual_compact_used_time? Is it possible manual_compact_last_time_used_ms == 0 but manual_compact_last_finish_time_ms > 0 ?

If the paramaters are set in a previous round, and havn't complete compaction and set the vaiables in current round before restarting the server, will it be considered as finished?

Could you add some tests to verify the bug has been fixed?

acelyc111 avatar Apr 16 '24 02:04 acelyc111

It's OK to me to remove the periodic manual compaction function, it can be replaced by thirdparty tools, feel free to remove it.

I think it's reasonable to keep this periodic manual compaction function. Cause many user using spark to set manual compaction time now. If we remove this function in new version, we have to change the change user habits. Actully, I think we lack a unified management platform to control every things.

What is "using spark to set manual compaction time" ? Is it a Pegasus binding method? I don't think it's a block to remove the periodic manual compaction. This is not the core tasks of Pegasus, it make the code a bit of smelly. At least, you need to make this patch to resolve these issues mentioned above.

In high version of PEGASUS-SPARK , a compaction request will be send to cluster after user call startBulkLoad . And I will update PEGASUS-SPARK this week. I do not think we should remove perodic manual compaction code now . Cause I think all its bugs about manual compaction have been fixed, and they are all minor issues.

Thanks to update the pegasus-spark project! It's not well maintenance for about 4 years.

Do you think it's confusing that the manual compact status depends on last_manual_compact_used_time? Is it possible manual_compact_last_time_used_ms == 0 but manual_compact_last_finish_time_ms > 0 ?

If the paramaters are set in a previous round, and havn't complete compaction and set the vaiables in current round before restarting the server, will it be considered as finished?

Could you add some tests to verify the bug has been fixed?

Firstly , I update the newest version of PEGASUS-SPARK and commit a merge request. And I think manual_compact_last_time_used_ms == 0 but manual_compact_last_finish_time_ms > 0 will happened before this pr . However , now , in pegasus_server_impl::start replica will reload the parameter from disk . If last_finish_time_ms be set , last_time_used_ms will be set also.The results of the staging environment test verified this point. In this week we will try to put this manual compaction fix version to online environment , and we will verify the effectiveness of bug fixes in large data environments .

ninsmiracle avatar Apr 17 '24 12:04 ninsmiracle