Fix negative position read on BufferedChannel cause data can't read problem.
Descriptions of the changes in this PR:
Motivation
fix #4022
in brief of this change.
When do compaction. worker thread try to extract the ledgerLogger metadata header to get ledger map. the header to point map offset is -1.
so the read request to BufferedChannel will reject this read and throw java.lang.IllegalArgumentException: Negative position
but there is a read cache in BufferedChannel the -1 is cached in the Channel which will cause following read on this channel fail to get right data.
more other background:
our production env bookie can't gc log files.
after add some log we found that
2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 50236
2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - Recovering ledgers maps for log 50236 at offset: -1
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - error when extractEntryLogMetadataFromIndex
java.lang.IllegalArgumentException: Negative position
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataFromIndex(EntryLogger.java:1124) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1085) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] INFO org.apache.bookkeeper.bookie.EntryLogger - Failed to get ledgers map index from: 50236.log : Negative position
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 1024
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 126754820 add entrySize 126753792 ledgerId 5593924864
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos -1787371633 add entrySize -1914126457 ledgerId -1675451311566362959
2023-07-08T22:12:50,607+0800 [GarbageCollectorThread-19-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
java.lang.IllegalArgumentException: Negative position
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1018) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1191) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1090) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
but using bookkeeper tool to read the log it success without throw exception
2023-07-08T21:54:52,401+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.ReadLogCommand - Scan entry log 50236 (c43c.log)
2023-07-08T21:54:52,475+0800 [main] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER 50236 readBufferStartPosition -9223372036854775808
2023-07-08T21:54:52,486+0800 [main] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 1024
2023-07-08T21:54:52,492+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851269, Eid=20, ByteOffset=1024, EntrySize=495132 ---------
2023-07-08T21:54:52,492+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type: DATA
2023-07-08T21:54:52,492+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed: 19
2023-07-08T21:54:52,492+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,493+0800 [main] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 496160 add entrySize 495132 ledgerId 21851269
2023-07-08T21:54:52,494+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851269, Eid=21, ByteOffset=496160, EntrySize=83751 ---------
2023-07-08T21:54:52,494+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type: DATA
2023-07-08T21:54:52,494+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed: 19
2023-07-08T21:54:52,494+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,494+0800 [main] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 579915 add entrySize 83751 ledgerId 21851269
2023-07-08T21:54:52,494+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851454, Eid=4, ByteOffset=579915, EntrySize=63 ---------
2023-07-08T21:54:52,494+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type: DATA
2023-07-08T21:54:52,494+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed: 3
2023-07-08T21:54:52,494+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,495+0800 [main] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 579982 add entrySize 63 ledgerId 21851454
2023-07-08T21:54:52,495+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851454, Eid=5, ByteOffset=579982, EntrySize=43 ---------
2023-07-08T21:54:52,495+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type: DATA
2023-07-08T21:54:52,495+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed: 3
2023-07-08T21:54:52,495+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,495+0800 [main] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 580029 add entrySize 43 ledgerId 21851454
2023-07-08T21:54:52,495+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851474, Eid=3, ByteOffset=580029, EntrySize=43 ---------
you can see the pos in BufferedChannel is changed like this which is not very good.
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 1024
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 126754820 add entrySize 126753792 ledgerId 5593924864
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - MARKER pos -1787371633 add entrySize -1914126457 ledgerId -1675451311566362959
and we found the log header is which offset and count is both -1
2023-07-08T22:14:49,412+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - Recovering ledgers maps for log 50236 at offset: -1, header version 1 offset -1 count -1
see https://github.com/apache/bookkeeper/blob/fb0d3d6faab3f9703c7f63b280f1f9c2e18f966d/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java#L1034-L1047
so the log entry header offset pos is -1.
when -1 pass to BufferedReadChannel
an java.lang.IllegalArgumentException: Negative position throws first time. what is worse is that this position will be cached in BufferedReadChannel see line 92 throw exception but readPosition is not reset, so the position is -1 even after the exception is thrown.
https://github.com/apache/bookkeeper/blob/fb0d3d6faab3f9703c7f63b280f1f9c2e18f966d/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java#L88-L99
so when the second time try to execute extractEntryLogMetadataByScanning the BufferedReadChannel is cached in ThreadLocal so the same BufferedReadChannel is return again. but this time if will hit this condition
https://github.com/apache/bookkeeper/blob/fb0d3d6faab3f9703c7f63b280f1f9c2e18f966d/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java#L77-L84
also we saw this with arthas. see readBufferStartPosition the first time is 0 which is when extractEntryLogMetadataFromIndex execute first.
the second capture you can see this position is -1.
so the third time when execute extractMetaFromEntryLogs the position is a very small negivate number.
method=org.apache.bookkeeper.bookie.EntryLogger.getFromChannels location=AtExit
ts=2023-07-08 21:05:30; [cost=0.002951ms] result=@ArrayList[
@Object[][
@Long[50236],
],
@BufferedReadChannel[
readCapacity=@Integer[4096],
readBuffer=@InstrumentedUnpooledUnsafeHeapByteBuf[
alloc=@UnpooledByteBufAllocator[UnpooledByteBufAllocator(directByDefault: true)],
array=@byte[][isEmpty=false;size=4096],
tmpNioBuf=@HeapByteBuffer[java.nio.HeapByteBuffer[pos=4096 lim=4096 cap=4096]],
REFCNT_FIELD_OFFSET=@Long[32],
AIF_UPDATER=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@7e2bf17d],
updater=@[io.netty.buffer.AbstractReferenceCountedByteBuf$1@865b4ec],
refCnt=@Integer[2],
logger=@LocationAwareSlf4JLogger[LocationAwareSlf4JLogger(io.netty.buffer.AbstractByteBuf)],
LEGACY_PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.bytebuf.checkAccessible],
PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.checkAccessible],
checkAccessible=@Boolean[true],
PROP_CHECK_BOUNDS=@String[io.netty.buffer.checkBounds],
checkBounds=@Boolean[true],
leakDetector=@ResourceLeakDetector[io.netty.util.ResourceLeakDetector@30a63b98],
readerIndex=@Integer[0],
writerIndex=@Integer[4096],
markedReaderIndex=@Integer[0],
markedWriterIndex=@Integer[0],
maxCapacity=@Integer[2147483647],
],
readBufferStartPosition=@Long[0],
invocationCount=@Long[133],
cacheHitCount=@Long[680901],
fileChannel=@FileChannelImpl[
],
]
method=org.apache.bookkeeper.bookie.EntryLogger.getFromChannels location=AtExit
ts=2023-07-08 21:05:30; [cost=0.002436ms] result=@ArrayList[
@Object[][
@Long[50236],
],
@BufferedReadChannel[
readCapacity=@Integer[4096],
readBuffer=@InstrumentedUnpooledUnsafeHeapByteBuf[
alloc=@UnpooledByteBufAllocator[UnpooledByteBufAllocator(directByDefault: true)],
array=@byte[][isEmpty=false;size=4096],
tmpNioBuf=@HeapByteBuffer[java.nio.HeapByteBuffer[pos=0 lim=4096 cap=4096]],
REFCNT_FIELD_OFFSET=@Long[32],
AIF_UPDATER=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@7e2bf17d],
updater=@[io.netty.buffer.AbstractReferenceCountedByteBuf$1@865b4ec],
refCnt=@Integer[2],
logger=@LocationAwareSlf4JLogger[LocationAwareSlf4JLogger(io.netty.buffer.AbstractByteBuf)],
LEGACY_PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.bytebuf.checkAccessible],
PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.checkAccessible],
checkAccessible=@Boolean[true],
PROP_CHECK_BOUNDS=@String[io.netty.buffer.checkBounds],
checkBounds=@Boolean[true],
leakDetector=@ResourceLeakDetector[io.netty.util.ResourceLeakDetector@30a63b98],
readerIndex=@Integer[0],
writerIndex=@Integer[4096],
markedReaderIndex=@Integer[0],
markedWriterIndex=@Integer[0],
maxCapacity=@Integer[2147483647],
],
readBufferStartPosition=@Long[-1],
invocationCount=@Long[134],
cacheHitCount=@Long[680901],
fileChannel=@FileChannelImpl[
],
]
method=org.apache.bookkeeper.bookie.EntryLogger.getFromChannels location=AtExit
ts=2023-07-08 21:07:50; [cost=0.002968ms] result=@ArrayList[
@Object[][
@Long[50236],
],
@BufferedReadChannel[
readCapacity=@Integer[4096],
readBuffer=@InstrumentedUnpooledUnsafeHeapByteBuf[
alloc=@UnpooledByteBufAllocator[UnpooledByteBufAllocator(directByDefault: true)],
array=@byte[][isEmpty=false;size=4096],
tmpNioBuf=@HeapByteBuffer[java.nio.HeapByteBuffer[pos=0 lim=4096 cap=4096]],
REFCNT_FIELD_OFFSET=@Long[32],
AIF_UPDATER=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@7e2bf17d],
updater=@[io.netty.buffer.AbstractReferenceCountedByteBuf$1@865b4ec],
refCnt=@Integer[2],
logger=@LocationAwareSlf4JLogger[LocationAwareSlf4JLogger(io.netty.buffer.AbstractByteBuf)],
LEGACY_PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.bytebuf.checkAccessible],
PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.checkAccessible],
checkAccessible=@Boolean[true],
PROP_CHECK_BOUNDS=@String[io.netty.buffer.checkBounds],
checkBounds=@Boolean[true],
leakDetector=@ResourceLeakDetector[io.netty.util.ResourceLeakDetector@30a63b98],
readerIndex=@Integer[0],
writerIndex=@Integer[4096],
markedReaderIndex=@Integer[0],
markedWriterIndex=@Integer[0],
maxCapacity=@Integer[2147483647],
],
readBufferStartPosition=@Long[-1787371633],
invocationCount=@Long[138],
cacheHitCount=@Long[711850],
fileChannel=@FileChannelImpl[
],
]
so the wrong data is returned.
still have no idea why the entry log header save -1.
Changes
clean cached read buffer and readPosition when read from BufferedChannel throw exception
2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 50236
2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - Recovering ledgers maps for log 50236 at offset: -1
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN org.apache.bookkeeper.bookie.EntryLogger - error when extractEntryLogMetadataFromIndex
java.lang.IllegalArgumentException: Negative position
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataFromIndex(EntryLogger.java:1124) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1085) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
I checkout the code to release-4.15.4, but the code in my local does not match your stack. Did you modify some code in your environment?
@horizonzy hi, this should be the origin stack, the arthas tool may inject some trace. move the line number.
2023-07-08T18:53:36,200+0800 [GarbageCollectorThread-19-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
java.lang.IllegalArgumentException: Negative position
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1008) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1168) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1071) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-07-08T18:53:36,300+0800 [SyncThread-20-1] INFO org.apache.bookkeeper.bookie.Journal - garbage collected journal 188800d0755.txn
50236
2023-07-08T19:35:03,633+0800 [SyncThread-20-1] INFO org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog - Synced entry logger 53177 to disk.
2023-07-08T19:35:20,494+0800 [GarbageCollectorThread-19-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 50236
2023-07-08T19:35:20,494+0800 [GarbageCollectorThread-19-1] INFO org.apache.bookkeeper.bookie.EntryLogger - Failed to get ledgers map index from: 50236.log : Negative position
2023-07-08T19:35:20,616+0800 [GarbageCollectorThread-19-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
java.lang.IllegalArgumentException: Negative position
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1008) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1168) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1071) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
I checked the code, the ledgerMapOffset corresponds to channel position, and the ledgersCount corresponds to ledger count. Both values can't be -1. It's so strange.
Besides this pr, we can override the ledgerMapOffset and ledgersCount values for the log file after scanning the log file.
Hey @eolivelli @hangc0276 @zymap,
Just wanted to let you know that is still seems to be a thing.
We are using:
Pulsar 4.0.0
With
Apache Bookie 4.17.1
Deployed via
https://pulsar.apache.org/charts
Storage was constantly growing on the affected bookie, when triggering compaction manually via:
curl -X PUT http://localhost:8000/api/v1/bookie/gc
I've seen same pattern in the logs:
2025-04-16T14:28:10,211+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - GarbageCollectorThread-6-1 Set forceGarbageCollection to false after force GC to make it forceGC-able again.
2025-04-16T14:28:10,152+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.DefaultEntryLogger - Failed to get ledgers map index from: 4503.log : Negative position
2025-04-16T14:28:10,152+0000 [GarbageCollectorThread-6-1] DEBUG org.apache.bookkeeper.bookie.DefaultEntryLogger - Recovering ledgers maps for log 4503 at offset: -1
2025-04-16T14:28:10,151+0000 [GarbageCollectorThread-6-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 4503
when entry log was moved to backup location:
#dec 4503 -> hex 1197
pulsar-bookie-2:~/bookkeeper/ledgers/current$ ls bkp/
1197.log
compaction went fine and reclaimed expected storage.