Exception in manager tablet group watcher when attempting tablet update
Seeing the following error in the manager logs on a small test cluster running the elasticity branch.
2024-01-10T20:05:53,285 [manager.Manager] ERROR: Error processing table state for store Normal Tablets
org.apache.accumulo.server.manager.state.DistributedStoreException: java.lang.IllegalStateException: Duplicate extents not handled
at org.apache.accumulo.server.manager.state.AbstractTabletStateStore.unassign(AbstractTabletStateStore.java:162) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.accumulo.server.manager.state.AbstractTabletStateStore.suspend(AbstractTabletStateStore.java:111) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.accumulo.server.manager.state.LoggingTabletStateStore.suspend(LoggingTabletStateStore.java:94) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.accumulo.manager.TabletGroupWatcher.handleDeadTablets(TabletGroupWatcher.java:880) ~[accumulo-manager-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.accumulo.manager.TabletGroupWatcher.flushChanges(TabletGroupWatcher.java:943) ~[accumulo-manager-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.accumulo.manager.TabletGroupWatcher.manageTablets(TabletGroupWatcher.java:618) ~[accumulo-manager-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.accumulo.manager.TabletGroupWatcher.run(TabletGroupWatcher.java:674) ~[accumulo-manager-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
Caused by: java.lang.IllegalStateException: Duplicate extents not handled
at com.google.common.base.Preconditions.checkState(Preconditions.java:512) ~[guava-33.0.0-jre.jar:?]
at org.apache.accumulo.server.metadata.ConditionalTabletsMutatorImpl.mutateTablet(ConditionalTabletsMutatorImpl.java:85) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
at org.apache.accumulo.server.manager.state.AbstractTabletStateStore.unassign(AbstractTabletStateStore.java:127) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
... 6 more
@keith-turner - I have reviewed the code around this error. It's not clear to me how this could be occurring. I'm wondering if an IsolatedScanner should be used in the TabletManagementScanner.
@keith-turner - I have reviewed the code around this error. It's not clear to me how this could be occurring. I'm wondering if an IsolatedScanner should be used in the TabletManagementScanner.
I did not see anything either. Looked at the code and did not see any place in the code where a tablet might be added twice in the TGW scan over all the tablets. Opened #4214 to add the extent to the error message.
Closing this as fixed for now by #4214
#4214 does not look to be a "fix". It adds information to the precondition check which may help track down the issue if it occurs. Would it be better to leave this open to prompt additional work on determining the root cause?
@EdColeman - that's true, my reasoning is that #4214 adds information to help track down the issue, which given the velocity of the code changes in the elasticity branch may never be seen again. Opening a new issue in the future, presumably with more information, should be sufficient. It may also be likely that if this happens again in the future, the reporter doesn't see this issue, and creates a new issue anyway.