paimon
paimon copied to clipboard
[Bug] HiveCatalogLock runWithLock() method executed exception :“Acquire lock failed with time: PT8M7.098S”
Search before asking
- [x] I searched in the issues and found nothing similar.
Paimon version
Paimon-1.1.1
Compute Engine
Flink-1.18.1
Minimal reproduce step
No recurrence has been reported.
What doesn't meet your expectations?
Java.lang.RuntimeException: Exception occurs when committing snapshot #29953 by user dca7f3a6-3177-4ae2-b9de-0902024b5314 with identifier 1 and kind APPEND.
Cannot clean up because we can't determine the success.
at org.apache.paimon.operation.FileStoreCommitImpl.commitSnapshotImpl(FileStoreCommitImpl.java:1166)
at org.apache.paimon.operation.FileStoreCommitImpl.tryCommitOnce(FileStoreCommitImpl.java:1015)
at org.apache.paimon.operation.FileStoreCommitImpl.tryCommit(FileStoreCommitImpl.java:732)
at org.apache.paimon.operation.FileStoreCommitImpl.commit(FileStoreCommitImpl.java:323)
at org.apache.paimon.table.sink.TableCommitImpl.commitMultiple(TableCommitImpl.java:218)
at org.apache.paimon.table.sink.TableCommitImpl.filterAndCommitMultiple(TableCommitImpl.java:257)
at org.apache.paimon.flink.sink.StoreCommitter.filterAndCommit(StoreCommitter.java:119)
at org.apache.paimon.flink.sink.Committer.filterAndCommit(Committer.java:60)
at org.apache.paimon.flink.sink.RestoreAndFailCommittableStateManager.recover(RestoreAndFailCommittableStateManager.java:82)
at org.apache.paimon.flink.sink.RestoreAndFailCommittableStateManager.initializeState(RestoreAndFailCommittableStateManager.java:77)
at org.apache.paimon.flink.sink.CommitterOperator.initializeState(CommitterOperator.java:147)
at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:122)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:274)
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728)
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693)
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Acquire lock failed with time: PT8M1.606S
at org.apache.paimon.hive.HiveCatalogLock.lock(HiveCatalogLock.java:110)
at org.apache.paimon.hive.HiveCatalogLock.runWithLock(HiveCatalogLock.java:66)
at org.apache.paimon.operation.Lock$CatalogLockImpl.runWithLock(Lock.java:73)
at org.apache.paimon.catalog.RenamingSnapshotCommit.commit(RenamingSnapshotCommit.java:69)
at org.apache.paimon.operation.FileStoreCommitImpl.commitSnapshotImpl(FileStoreCommitImpl.java:1161)
... 22 more
Anything else?
No response
Are you willing to submit a PR?
- [ ] I'm willing to submit a PR!
public <T> T runWithLock(String database, String table, Callable<T> callable) throws Exception {
long lockId = lock(database, table);
try {
return callable.call();
} finally {
unlock(lockId);
}
}
It seems that some unexpected exception caused the unlock() method in the finally block not to execute, leading to a timeout in subsequent lock acquisitions.
i have faced the same issue, when our hive metastore database got auto upgraded and the lock on hive metastore was not released
i have faced the same issue, when our hive metastore database got auto upgraded and the lock on hive metastore was not released
Thank you for your reply. We haven't upgraded the Hive metastore, and there are still no location issues.