accumulo
accumulo copied to clipboard
Improve handling of interrupt conditions.
Is your feature request related to a problem? Please describe. We are not consistent on handling interrupts.
Describe the solution you'd like
- Code should not swallow interrupts without at least resetting the interrupt flag - throwing an Exception if possible and appropriate.
- If an exception cannot be thrown, the caller code should be checking interrupt status and reacting if an interrupt occurs.
Describe alternatives you've considered The situation described below is a concrete examples, but similar issues occur in other places - these could be handled in bulk - or at least if the code is modified for other reason, corrected at that time on a case by case basis.
Additional context When reviewing #1890 - the code in BulkImportCacheCleaner caught the interrupt exception like this:
} catch (KeeperException | InterruptedException e) {
// we'll just clean it up again later
log.debug("Error reading bulk import live transactions {}", e);
}
The code was modified to reassert the interrupt with:
https://github.com/apache/accumulo/blob/307bef2fe6a8b2b7152eabaf9b16950cff72729a/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/BulkImportCacheCleaner.java#L57-L64
The caller code - in TabletServer schedules this task with the following:
https://github.com/apache/accumulo/blob/6af856b38bb57f49418547a8b9c262c66a1fe31e/server/tserver/src/main/java/org/apache/accumulo/tserver/TabletServer.java#L778-L780
The call to scheduleWithFixedDelay returns a ScheduledFuture. that future could be used to attempt to cancel running tasks - or at least prevent new tasks from being scheduled if the interrupt status was checked. The goal would be to provide a more graceful approach to termination when possible (like closing file descriptors) - or at least improve termination responsiveness, The running tasks would probably need to properly handle mayInterruptIfRunning
Other considerations:
Some of the exception handling may have improved with the standardization of thread pools in #1808 - and maybe that code could also check for interrupts as an alternate approach.
This task may overlap with #946
I recall discussing this in the past. There may be some old JIRA issues that this would supersede that could be closed with this.
Some parts of the code could use this util: https://github.com/apache/accumulo/blob/3fd5cad92f9b63ac19e4466f3f2d5237b905262c/core/src/main/java/org/apache/accumulo/fate/util/UtilWaitThread.java#L49 But it should probably be moved out of the FATE package. It is already used in a bunch of ITs.
Some parts of the code could use this util: https://github.com/apache/accumulo/blob/3fd5cad92f9b63ac19e4466f3f2d5237b905262c/core/src/main/java/org/apache/accumulo/fate/util/UtilWaitThread.java#L49
But it should probably be moved out of the FATE package. It is already used in a bunch of ITs.
This is a holdover from when accumulo-fate was a separate jar. A lot of the fate stuff was left in their current packages to avoid making any changes that might affect FaTE serialization. This one is probably safe to move, since it is a static void method only, and there's probably other generic utilities that can be moved as well, but since it's all internal, it doesn't matter that much to move it. But, if somebody does move it, they should be careful to avoid breaking FaTE serialization by moving serialized objects.
It seems that there were multiple packaging decisions that were originally made with the FATE design - I believe the stated goal was that FATEs could become a stand-alone product outside of Accumulo. Should we revisit that? If moving the method would impact FATE serialization should other changes be made concurrently so that FATE serialization would only change once? Also, I believe that during upgrades it is necessary to insure that no FATE transactions are pending or the upgrade aborts - so changes in serialization should not be a factor between upgrades.
It seems that there were multiple packaging decisions that were originally made with the FATE design - I believe the stated goal was that FATEs could become a stand-alone product outside of Accumulo.
While that may have been something that was considered, the implementation was, and has always been, very tightly coupled to Accumulo itself.
Should we revisit that? If moving the method would impact FATE serialization should other changes be made concurrently so that FATE serialization would only change once?
The problem is that we use Java serialization (in hindsight, a bad choice), and therefore, lots of Java objects can accidentally leak into our serialized FaTE objects. We have, for example, unintentionally serialized Thrift objects (I believe TInfo was one). I don't know of any currently active proposals to redesign FaTE to decouple it from Accumulo, or to change how we serialize objects there any time soon.
I don't see any compelling reason to revisit FaTE serialization at this time. But, if we did do something like that, it certainly would be better to do it across the board all at once. One future compelling reason might be the introduction of record objects in a future Java release, which may have safe and reliable serialization strategies. Another compelling reason would be to take advantage of some protobuf or thrift serialization for storage efficiency or performance, or something else. However, libraries like protobuf and thrift are notoriously problematic with dependency convergence.
Also, I believe that during upgrades it is necessary to insure that no FATE transactions are pending or the upgrade aborts - so changes in serialization should not be a factor between upgrades.
In general, that is true for major releases, but not for patch/bugfix releases. I don't know off the top of my head what we've done for minor releases, or what our current upgrade code would expect before upgrading from 2.0 to 2.1. However, the more incompatibilities we avoid, the easier our upgrade code and testing paths are.
In any case, I think it should be fine to relocate UtilWaitThread, though I don't see a huge value in it. I'm just suggesting caution whenever relocating any object that has potential to be serialized. Classes in packages marked as "fate" have a higher likelihood of being serialized, but they are not the only objects that could be affected by class relocations.
May want to look at flatbuffers if you have not done so already.
In any case, I think it should be fine to relocate UtilWaitThread, though I don't see a huge value in it. I'm just suggesting caution whenever relocating any object that has potential to be serialized. Classes in packages marked as "fate" have a higher likelihood of being serialized, but they are not the only objects that could be affected by class relocations.
I agree. I only recommended its use as an easy way to standardize our handling of the sleep and catch IterruptedException use case, one of the situations this ticket aims to fix.
Caller Code (TabletServer), consider using the returned ScheduledFuture from scheduleWithFixedDelay to cancel running tasks or prevent new ones if an interrupt occurs. This allows for a more graceful termination (closing resources) and improved responsiveness. Faced the same problem during my web development career but training at Full Stack Web development course helped me.