No such entry error caused by incorrect metadata
BUG REPORT
Describe the bug
The ensemble bookies of metadata and the actual written bookies are different, resulting in a read failure.
conf
- ensemble size: 6
- write quorum size: 2
- ack quorum size: 2
I encountered the following event.
The write operations failed and ensemble was changed as following:
[A, B, C, D, E, F]
[G, B, C, H, E, F]
[G, I, J, H, E, F]
10:00:11.806 [BookKeeperClientWorker-OrderedExecutor-24-0] WARN o.a.bookkeeper.client.PendingAddOp - Failed to write entry (xx, aa): Bookie operation timeout
…
10:00:12.001 [BookKeeperClientWorker-OrderedExecutor-24-0] INFO o.a.bookkeeper.client.LedgerHandle - New Ensemble: [G, B, C, H, E, F] for ledger: xx
…
10:00:12.303 [BookKeeperClientWorker-OrderedExecutor-24-0] WARN o.a.bookkeeper.client.PendingAddOp - Failed to write entry (xx, bb): Bookie operation timeout
…
10:00:12.527 [BookKeeperClientWorker-OrderedExecutor-24-0] INFO o.a.bookkeeper.client.LedgerHandle - New Ensemble: [G, I, J, H, E, F] for ledger: xx
The entries were written to bookie G and H, but not to bookie I and J.
Bookie I and J don't have any entries of ledger xx.
The write and ack quorum is 2, so I got a "No such entry" error.
$ bookkeeper shell readledger -msg -ledgerid xx -firstentryid yy -lastentryid yy
11:25:50.215 [BookKeeperClientWorker-OrderedExecutor-24-0] ERROR o.a.bookkeeper.client.PendingReadOp - Read of ledger entry failed: Lxx Eyy-Eyy, Sent to [I, J], Heard from [] : bitset = {}, Error = 'No such entry'. First unread entry is (-1, rc = null)
11:25:50.224 [main] ERROR o.a.b.client.BookKeeperAdmin - Error reading entry yy from ledger xx
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83)
at org.apache.bookkeeper.client.SyncCallbackUtils$SyncReadCallback.readComplete(SyncCallbackUtils.java:231)
at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:820)
at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38)
at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
11:25:50.425 [main] ERROR o.a.bookkeeper.bookie.BookieShell - Got an exception
com.google.common.util.concurrent.UncheckedExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
at org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand.apply(ReadLedgerCommand.java:137)
at org.apache.bookkeeper.bookie.BookieShell$ReadLedgerEntriesCmd.runCmd(BookieShell.java:626)
at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:236)
at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2235)
at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2328)
Caused by: java.lang.RuntimeException: org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
at org.apache.bookkeeper.client.BookKeeperAdmin$LedgerEntriesIterator.hasNext(BookKeeperAdmin.java:435)
at org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand.readledger(ReadLedgerCommand.java:171)
at org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand.apply(ReadLedgerCommand.java:135)
... 4 common frames omitted
Caused by: org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83)
at org.apache.bookkeeper.client.SyncCallbackUtils$SyncReadCallback.readComplete(SyncCallbackUtils.java:231)
at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:820)
at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38)
at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26)
at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:750)
The actual data exists in bookie B and C.
# at bookie B
$ bookkeeper shell readlog -msg foo.log -ledgerid xx -entryid yy
Scan entry log bar (foo.log) for LedgerId xx for EntryId yy
--------- Lid=xx, Eid=yy, ByteOffset=, EntrySize= ---------
Type: DATA
LastConfirmed:
Data:
00000000 …
Expected behavior
The ensemble bookies of metadata and the actual written bookies are same.
Analysis of the entry log shows that bookie I and J also have entries of ledger xx.
However, the first few hundred entries that bookie I and J should have based on the metadata are missing.
And bookie B and C have tens of thousands of entries after changing the ensemble of the metadata.