bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

No such entry error caused by incorrect metadata

Open izumo27 opened this issue 2 years ago • 1 comments

BUG REPORT

Describe the bug

The ensemble bookies of metadata and the actual written bookies are different, resulting in a read failure.


conf
  • ensemble size: 6
  • write quorum size: 2
  • ack quorum size: 2

I encountered the following event.

The write operations failed and ensemble was changed as following:

[A, B, C, D, E, F]

[G, B, C, H, E, F]

[G, I, J, H, E, F]

10:00:11.806 [BookKeeperClientWorker-OrderedExecutor-24-0] WARN  o.a.bookkeeper.client.PendingAddOp   - Failed to write entry (xx, aa): Bookie operation timeout
…
10:00:12.001 [BookKeeperClientWorker-OrderedExecutor-24-0] INFO  o.a.bookkeeper.client.LedgerHandle   - New Ensemble: [G, B, C, H, E, F] for ledger: xx
…
10:00:12.303 [BookKeeperClientWorker-OrderedExecutor-24-0] WARN  o.a.bookkeeper.client.PendingAddOp   - Failed to write entry (xx, bb): Bookie operation timeout
…
10:00:12.527 [BookKeeperClientWorker-OrderedExecutor-24-0] INFO  o.a.bookkeeper.client.LedgerHandle   - New Ensemble: [G, I, J, H, E, F] for ledger: xx

The entries were written to bookie G and H, but not to bookie I and J.

Bookie I and J don't have any entries of ledger xx.

The write and ack quorum is 2, so I got a "No such entry" error.

$ bookkeeper shell readledger -msg -ledgerid xx -firstentryid yy -lastentryid yy


11:25:50.215 [BookKeeperClientWorker-OrderedExecutor-24-0] ERROR o.a.bookkeeper.client.PendingReadOp  - Read of ledger entry failed: Lxx Eyy-Eyy, Sent to [I, J], Heard from [] : bitset = {}, Error = 'No such entry'. First unread entry is (-1, rc = null)
11:25:50.224 [main] ERROR o.a.b.client.BookKeeperAdmin         - Error reading entry yy from ledger xx
org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
        at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83)
        at org.apache.bookkeeper.client.SyncCallbackUtils$SyncReadCallback.readComplete(SyncCallbackUtils.java:231)
        at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:820)
        at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38)
        at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:750)
11:25:50.425 [main] ERROR o.a.bookkeeper.bookie.BookieShell    - Got an exception
com.google.common.util.concurrent.UncheckedExecutionException: org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
        at org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand.apply(ReadLedgerCommand.java:137)
        at org.apache.bookkeeper.bookie.BookieShell$ReadLedgerEntriesCmd.runCmd(BookieShell.java:626)
        at org.apache.bookkeeper.bookie.BookieShell$MyCommand.runCmd(BookieShell.java:236)
        at org.apache.bookkeeper.bookie.BookieShell.run(BookieShell.java:2235)
        at org.apache.bookkeeper.bookie.BookieShell.main(BookieShell.java:2328)
Caused by: java.lang.RuntimeException: org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
        at org.apache.bookkeeper.client.BookKeeperAdmin$LedgerEntriesIterator.hasNext(BookKeeperAdmin.java:435)
        at org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand.readledger(ReadLedgerCommand.java:171)
        at org.apache.bookkeeper.tools.cli.commands.bookie.ReadLedgerCommand.apply(ReadLedgerCommand.java:135)
        ... 4 common frames omitted
Caused by: org.apache.bookkeeper.client.BKException$BKNoSuchEntryException: No such entry
        at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83)
        at org.apache.bookkeeper.client.SyncCallbackUtils$SyncReadCallback.readComplete(SyncCallbackUtils.java:231)
        at org.apache.bookkeeper.client.LedgerHandle$6.onFailure(LedgerHandle.java:820)
        at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:38)
        at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26)
        at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774)
        at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750)
        at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:456)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:750)

The actual data exists in bookie B and C.

# at bookie B
$ bookkeeper shell readlog -msg foo.log -ledgerid xx -entryid yy
Scan entry log bar (foo.log) for LedgerId xx for EntryId yy
--------- Lid=xx, Eid=yy, ByteOffset=, EntrySize= ---------
Type:           DATA
LastConfirmed:
Data:

00000000 …

Expected behavior

The ensemble bookies of metadata and the actual written bookies are same.

izumo27 avatar Oct 06 '23 09:10 izumo27

Analysis of the entry log shows that bookie I and J also have entries of ledger xx. However, the first few hundred entries that bookie I and J should have based on the metadata are missing. And bookie B and C have tens of thousands of entries after changing the ensemble of the metadata. ensemble

izumo27 avatar Oct 14 '23 15:10 izumo27