Modified FATE to allow transactions to run to completion without interruption
FATE processes transactions in an interleaved piece-wise manner. In the case of a multi-step transaction, FATE could potentially execute a step of another transaction before executing the next step of the current transaction. This change enables the developer to indicate via a new method on the Repo interface that the transaction type should be run start to finish without being interrupted.
This is draft because I want to get early feedback on approach. I'm going to add some tests for some failure conditions.
This approach will tie up a FATE thread. Another possible approach is to have a ZooStore.reserve() prefer transactions that are in an IN_PROGRESS state over those that are SUBMITTED (those have never run). However it seems like scanning for in progress work could do a lot more ZK reads.
This approach will tie up a FATE thread.
It will, but only for those transactions that are marked so
Another possible approach is to have a ZooStore.reserve() prefer transactions that are in an IN_PROGRESS state over those that are SUBMITTED (those have never run)
That would allow transactions to still interleave. When discussing this, I thought you had given me the example of a split task, that you wanted to finish as fast as possible.
That would allow transactions to still interleave. When discussing this, I thought you had given me the example of a split task, that you wanted to finish as fast as possible.
That is not exactly what I was thinking. Was thinking of prioritizing FATEs that have started running over those that have never run when choosing what to run next.
I like the approach in this PR, except that it ties up a thread and could lead all FATE threads sleeping in the worst case.
That is not exactly what I was thinking. Was thinking of prioritizing FATEs that have started running over those that have never run when choosing what to run next.
Ah, ok, I didn't get that from our conversation.
I like the approach in this PR, except that it ties up a thread and could lead all FATE threads sleeping in the worst case.
Yeah, I'm curious how many FATE ops are going to take advantage of this. I'm assuming SPLIT and MERGE as they require the tablet to be un-hosted, so you want them to run as fast as possible so that the tablets can be re-hosted. It would be trivial to put in an escape path, say if we have received a non-zero response from isReady N times in a row.
@dlmarion You requested my review on this, but I feel like it's a bit outside my area of expertise. I'd be interested in a discussion sometime, to get a high level understanding of it, though, along with what's motivating it. Right now, I don't have any opinions on the approach one way or another.
@dlmarion You requested my review on this, but I feel like it's a bit outside my area of expertise. I'd be interested in a discussion sometime, to get a high level understanding of it, though, along with what's motivating it. Right now, I don't have any opinions on the approach one way or another.
@ctubbsii - I wasn't sure if you were familiar with the FATE code or not. I figured you may also have some input on the behavior change.
@dlmarion I might have some thoughts, but I think I'd need it explained to me at an elementary level first. Mostly, though, I'm just curious to learn a bit more about where you're headed with this. I'm thinking maybe we can post another video meet in Slack some time, and discuss it with anybody interested in joining.
Superceded by #4589