accumulo icon indicating copy to clipboard operation
accumulo copied to clipboard

Exception in manager tablet group watcher when attempting tablet update

Open keith-turner opened this issue 2 years ago • 1 comments

Seeing the following error in the manager logs on a small test cluster running the elasticity branch.

2024-01-10T20:05:53,285 [manager.Manager] ERROR: Error processing table state for store Normal Tablets
org.apache.accumulo.server.manager.state.DistributedStoreException: java.lang.IllegalStateException: Duplicate extents not handled
        at org.apache.accumulo.server.manager.state.AbstractTabletStateStore.unassign(AbstractTabletStateStore.java:162) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at org.apache.accumulo.server.manager.state.AbstractTabletStateStore.suspend(AbstractTabletStateStore.java:111) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at org.apache.accumulo.server.manager.state.LoggingTabletStateStore.suspend(LoggingTabletStateStore.java:94) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at org.apache.accumulo.manager.TabletGroupWatcher.handleDeadTablets(TabletGroupWatcher.java:880) ~[accumulo-manager-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at org.apache.accumulo.manager.TabletGroupWatcher.flushChanges(TabletGroupWatcher.java:943) ~[accumulo-manager-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at org.apache.accumulo.manager.TabletGroupWatcher.manageTablets(TabletGroupWatcher.java:618) ~[accumulo-manager-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at org.apache.accumulo.manager.TabletGroupWatcher.run(TabletGroupWatcher.java:674) ~[accumulo-manager-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
Caused by: java.lang.IllegalStateException: Duplicate extents not handled
        at com.google.common.base.Preconditions.checkState(Preconditions.java:512) ~[guava-33.0.0-jre.jar:?]
        at org.apache.accumulo.server.metadata.ConditionalTabletsMutatorImpl.mutateTablet(ConditionalTabletsMutatorImpl.java:85) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        at org.apache.accumulo.server.manager.state.AbstractTabletStateStore.unassign(AbstractTabletStateStore.java:127) ~[accumulo-server-base-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT]
        ... 6 more

keith-turner avatar Jan 10 '24 21:01 keith-turner

@keith-turner - I have reviewed the code around this error. It's not clear to me how this could be occurring. I'm wondering if an IsolatedScanner should be used in the TabletManagementScanner.

dlmarion avatar Jan 23 '24 18:01 dlmarion

@keith-turner - I have reviewed the code around this error. It's not clear to me how this could be occurring. I'm wondering if an IsolatedScanner should be used in the TabletManagementScanner.

I did not see anything either. Looked at the code and did not see any place in the code where a tablet might be added twice in the TGW scan over all the tablets. Opened #4214 to add the extent to the error message.

keith-turner avatar Feb 02 '24 03:02 keith-turner

Closing this as fixed for now by #4214

dlmarion avatar Feb 20 '24 16:02 dlmarion

#4214 does not look to be a "fix". It adds information to the precondition check which may help track down the issue if it occurs. Would it be better to leave this open to prompt additional work on determining the root cause?

EdColeman avatar Feb 20 '24 16:02 EdColeman

@EdColeman - that's true, my reasoning is that #4214 adds information to help track down the issue, which given the velocity of the code changes in the elasticity branch may never be seen again. Opening a new issue in the future, presumably with more information, should be sufficient. It may also be likely that if this happens again in the future, the reporter doesn't see this issue, and creates a new issue anyway.

dlmarion avatar Feb 20 '24 19:02 dlmarion