paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[Bug] HiveCatalogLock runWithLock() method executed exception :“Acquire lock failed with time: PT8M7.098S”

Open GangYang-HX opened this issue 2 months ago • 3 comments

Search before asking

  • [x] I searched in the issues and found nothing similar.

Paimon version

Paimon-1.1.1

Compute Engine

Flink-1.18.1

Minimal reproduce step

No recurrence has been reported.

What doesn't meet your expectations?

Java.lang.RuntimeException: Exception occurs when committing snapshot #29953 by user dca7f3a6-3177-4ae2-b9de-0902024b5314 with identifier 1 and kind APPEND. 
Cannot clean up because we can't determine the success.
	at org.apache.paimon.operation.FileStoreCommitImpl.commitSnapshotImpl(FileStoreCommitImpl.java:1166)
	at org.apache.paimon.operation.FileStoreCommitImpl.tryCommitOnce(FileStoreCommitImpl.java:1015)
	at org.apache.paimon.operation.FileStoreCommitImpl.tryCommit(FileStoreCommitImpl.java:732)
	at org.apache.paimon.operation.FileStoreCommitImpl.commit(FileStoreCommitImpl.java:323)
	at org.apache.paimon.table.sink.TableCommitImpl.commitMultiple(TableCommitImpl.java:218)
	at org.apache.paimon.table.sink.TableCommitImpl.filterAndCommitMultiple(TableCommitImpl.java:257)
	at org.apache.paimon.flink.sink.StoreCommitter.filterAndCommit(StoreCommitter.java:119)
	at org.apache.paimon.flink.sink.Committer.filterAndCommit(Committer.java:60)
	at org.apache.paimon.flink.sink.RestoreAndFailCommittableStateManager.recover(RestoreAndFailCommittableStateManager.java:82)
	at org.apache.paimon.flink.sink.RestoreAndFailCommittableStateManager.initializeState(RestoreAndFailCommittableStateManager.java:77)
	at org.apache.paimon.flink.sink.CommitterOperator.initializeState(CommitterOperator.java:147)
	at org.apache.flink.streaming.api.operators.StreamOperatorStateHandler.initializeOperatorState(StreamOperatorStateHandler.java:122)
	at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:274)
	at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:106)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:753)
	at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:728)
	at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:693)
	at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:953)
	at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:922)
	at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:746)
	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: Acquire lock failed with time: PT8M1.606S
	at org.apache.paimon.hive.HiveCatalogLock.lock(HiveCatalogLock.java:110)
	at org.apache.paimon.hive.HiveCatalogLock.runWithLock(HiveCatalogLock.java:66)
	at org.apache.paimon.operation.Lock$CatalogLockImpl.runWithLock(Lock.java:73)
	at org.apache.paimon.catalog.RenamingSnapshotCommit.commit(RenamingSnapshotCommit.java:69)
	at org.apache.paimon.operation.FileStoreCommitImpl.commitSnapshotImpl(FileStoreCommitImpl.java:1161)
	... 22 more

Anything else?

No response

Are you willing to submit a PR?

  • [ ] I'm willing to submit a PR!

GangYang-HX avatar Nov 06 '25 02:11 GangYang-HX

public <T> T runWithLock(String database, String table, Callable<T> callable) throws Exception {
        long lockId = lock(database, table);
        try {
            return callable.call();
        } finally {
            unlock(lockId);
        }
    }

It seems that some unexpected exception caused the unlock() method in the finally block not to execute, leading to a timeout in subsequent lock acquisitions.

GangYang-HX avatar Nov 06 '25 02:11 GangYang-HX

i have faced the same issue, when our hive metastore database got auto upgraded and the lock on hive metastore was not released

prabhagaranks avatar Nov 10 '25 16:11 prabhagaranks

i have faced the same issue, when our hive metastore database got auto upgraded and the lock on hive metastore was not released

Thank you for your reply. We haven't upgraded the Hive metastore, and there are still no location issues.

GangYang-HX avatar Nov 12 '25 08:11 GangYang-HX