accumulo icon indicating copy to clipboard operation
accumulo copied to clipboard

New FateOperations API

Open milleruntime opened this issue 3 years ago • 3 comments

Changes taken from https://github.com/apache/accumulo/pull/2215 with some improvements.

milleruntime avatar Jun 15 '22 17:06 milleruntime

I am a little behind on following this PR. What else is remaining to get this ready for review?

Manno15 avatar Jun 25 '22 15:06 Manno15

I am a little behind on following this PR. What else is remaining to get this ready for review?

It is still a WIP. Working on the new API design.

milleruntime avatar Jun 27 '22 14:06 milleruntime

@ctubbsii @EdColeman In my latest changes I added another method to instanceOps and a new enum type to assist with the multiple TXs use case:

void executeFateAction(FateAction action, List<FateTxId> transactions)
      throws AccumuloException, AccumuloSecurityException;

I started writing the javadoc for the API and realized we need to provide some background about how to use some of the methods, such as cancel and fail, that can only be used on FateTransactions with a certain status. I thought the name FateAction is more appropriate for the API, instead of "operation", which refers to the FATE REPO. Then changed the Thrift types to reflect this naming. I also created an enum in FateTransaction for Status that directly mirrors the internal enum TStatus in ReadOnlyTStore. Let me know what you think.

milleruntime avatar Jun 27 '22 18:06 milleruntime

There are a lot of conflicts in this PR with recent changes. I don't want to loose the design discussion here so I am going to close this in favor of a branch with all the changes merged together.

milleruntime avatar Aug 30 '22 12:08 milleruntime

There are a lot of conflicts in this PR with recent changes. I don't want to loose the design discussion here so I am going to close this in favor of a branch with all the changes merged together.

Can you link to the new PR here when it exists, so we can follow the conversation to that PR?

ctubbsii avatar Aug 30 '22 18:08 ctubbsii

@ctubbsii what do you think of the special methods, fail() and delete() being called on a FateTransaction? These currently require the Manager to be down to free the lock, which I feel is a special circumstance. A user can easily get the Set<FateTransaction> while the Manager is running and call these methods. We could easily throw an error if the Manager is running but it seems like that could trip up users.

milleruntime avatar Sep 02 '22 12:09 milleruntime

The Manager is supposed to be off in order to prevent FaTE concurrency issues during a delicate manual failure-handling situation. If we're directly manipulating FaTE by executing these special methods, we're already in a "special circumstance". Manual manipulation is not a routine operation. I think it makes sense to continue to leave the requirement to have the Manager be down to prevent issues. That's the current requirement, and changing it could introduce a lot of unforeseen concurrency issues.

Overall, I'm inclined to scrap the idea of adding a FaTE RPC and API so we can access it from the shell. That's the whole basis for these changes... and it's become bloated with all sorts of issues. The FateAdmin command functionality never should have been rolled into the shell to begin with. Instead, I think it should be merged with the more general purpose Admin command, like some of the other recent utilities, and removed from the shell. That will satisfy the initial issue we were trying to fix, which was removing the shell's dependency on SiteConfiguration. And, it has the advantage of not needing any of these proposed RPC/API changes, or changing anything about the requirements for online/offline servers. But, it does make it more accessible, alongside the other admin utilities for server-side maintenance.

ctubbsii avatar Sep 02 '22 16:09 ctubbsii

Overall, I'm inclined to scrap the idea of adding a FaTE RPC and API so we can access it from the shell. That's the whole basis for these changes... and it's become bloated with all sorts of issues.

I have been tinkering with new FaTE RPCs for an API and I am starting to feel the same way. It doesn't really make sense to have the tablet server execute RPCs when we are just messing with ZK. And we can't have RPCs run in the manager that require it to be down. Also, FaTE is internal to the Manager and was never really meant to be exposed at such a high level.

I think it should be merged with the more general purpose Admin command.

I like that idea. It looks like FateAdmin has been deprecated for a few versions. Do you think we could drop it for 2.1?

milleruntime avatar Sep 02 '22 16:09 milleruntime

I like that idea. It looks like FateAdmin has been deprecated for a few versions. Do you think we could drop it for 2.1?

No. We can drop it in 3, along with the fate stuff in the shell. But we should deprecate the fate stuff in the shell.

ctubbsii avatar Sep 02 '22 16:09 ctubbsii

But we should deprecate the fate stuff in the shell.

What about the new cancel operation?

milleruntime avatar Sep 02 '22 16:09 milleruntime

I started work to move the new FateCommand options (cancel and summary) added in 2.1 to the Admin command in https://github.com/apache/accumulo/pull/2914

milleruntime avatar Sep 02 '22 18:09 milleruntime

But we should deprecate the fate stuff in the shell.

What about the new cancel operation?

I seem to remember making the point when that was added, that FaTE wasn't really user-facing. It probably makes more sense to expose that as "cancel scheduled compaction" or cancelling some other scheduled operation. Perhaps a cancel flag can be added to the relevant tasks? Alternatively, just put the new cancel option in the Admin command with fail and delete, and roll back the changes made in the shell.

ctubbsii avatar Sep 02 '22 22:09 ctubbsii

Operators are familiar with the current shell fate operations and may be using them more that expected. I'm not opposed to moving it to the admin command, but the same functionality should be preserved. It would also be nice if the shell could print a message that points to the moved command and at least a hint on how to execute it.

EdColeman avatar Sep 07 '22 11:09 EdColeman