bookkeeper icon indicating copy to clipboard operation
bookkeeper copied to clipboard

Fix negative position read on BufferedChannel cause data can't read problem.

Open lifepuzzlefun opened this issue 2 years ago • 4 comments

Descriptions of the changes in this PR:

Motivation

fix #4022

in brief of this change.

When do compaction. worker thread try to extract the ledgerLogger metadata header to get ledger map. the header to point map offset is -1.

so the read request to BufferedChannel will reject this read and throw java.lang.IllegalArgumentException: Negative position

but there is a read cache in BufferedChannel the -1 is cached in the Channel which will cause following read on this channel fail to get right data.

more other background:

our production env bookie can't gc log files.

after add some log we found that

2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 50236
2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - Recovering ledgers maps for log 50236 at offset: -1
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - error when extractEntryLogMetadataFromIndex
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataFromIndex(EntryLogger.java:1124) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1085) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.EntryLogger - Failed to get ledgers map index from: 50236.log : Negative position
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 1024
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 126754820 add  entrySize 126753792 ledgerId 5593924864
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos -1787371633 add  entrySize -1914126457 ledgerId -1675451311566362959
2023-07-08T22:12:50,607+0800 [GarbageCollectorThread-19-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1018) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1191) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1090) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

but using bookkeeper tool to read the log it success without throw exception

2023-07-08T21:54:52,401+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.ReadLogCommand - Scan entry log 50236 (c43c.log)
2023-07-08T21:54:52,475+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER 50236 readBufferStartPosition -9223372036854775808
2023-07-08T21:54:52,486+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 1024
2023-07-08T21:54:52,492+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851269, Eid=20, ByteOffset=1024, EntrySize=495132 ---------
2023-07-08T21:54:52,492+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type:           DATA
2023-07-08T21:54:52,492+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed:  19
2023-07-08T21:54:52,492+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,493+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 496160 add  entrySize 495132 ledgerId 21851269
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851269, Eid=21, ByteOffset=496160, EntrySize=83751 ---------
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type:           DATA
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed:  19
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,494+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 579915 add  entrySize 83751 ledgerId 21851269
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851454, Eid=4, ByteOffset=579915, EntrySize=63 ---------
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type:           DATA
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed:  3
2023-07-08T21:54:52,494+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,495+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 579982 add  entrySize 63 ledgerId 21851454
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851454, Eid=5, ByteOffset=579982, EntrySize=43 ---------
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - Type:           DATA
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - LastConfirmed:  3
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil -
2023-07-08T21:54:52,495+0800 [main] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 580029 add  entrySize 43 ledgerId 21851454
2023-07-08T21:54:52,495+0800 [main] INFO  org.apache.bookkeeper.tools.cli.commands.bookie.FormatUtil - --------- Lid=21851474, Eid=3, ByteOffset=580029, EntrySize=43 ---------

you can see the pos in BufferedChannel is changed like this which is not very good.

2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 1024
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos 126754820 add  entrySize 126753792 ledgerId 5593924864
2023-07-08T22:12:50,598+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - MARKER pos -1787371633 add  entrySize -1914126457 ledgerId -1675451311566362959

and we found the log header is which offset and count is both -1

2023-07-08T22:14:49,412+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - Recovering ledgers maps for log 50236 at offset: -1, header version 1 offset -1 count -1

see https://github.com/apache/bookkeeper/blob/fb0d3d6faab3f9703c7f63b280f1f9c2e18f966d/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/DefaultEntryLogger.java#L1034-L1047

so the log entry header offset pos is -1.

when -1 pass to BufferedReadChannel an java.lang.IllegalArgumentException: Negative position throws first time. what is worse is that this position will be cached in BufferedReadChannel see line 92 throw exception but readPosition is not reset, so the position is -1 even after the exception is thrown.

https://github.com/apache/bookkeeper/blob/fb0d3d6faab3f9703c7f63b280f1f9c2e18f966d/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java#L88-L99

so when the second time try to execute extractEntryLogMetadataByScanning the BufferedReadChannel is cached in ThreadLocal so the same BufferedReadChannel is return again. but this time if will hit this condition

https://github.com/apache/bookkeeper/blob/fb0d3d6faab3f9703c7f63b280f1f9c2e18f966d/bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/BufferedReadChannel.java#L77-L84

also we saw this with arthas. see readBufferStartPosition the first time is 0 which is when extractEntryLogMetadataFromIndex execute first.

the second capture you can see this position is -1.

so the third time when execute extractMetaFromEntryLogs the position is a very small negivate number.

method=org.apache.bookkeeper.bookie.EntryLogger.getFromChannels location=AtExit
ts=2023-07-08 21:05:30; [cost=0.002951ms] result=@ArrayList[
    @Object[][
        @Long[50236],
    ],
    @BufferedReadChannel[
        readCapacity=@Integer[4096],
        readBuffer=@InstrumentedUnpooledUnsafeHeapByteBuf[
            alloc=@UnpooledByteBufAllocator[UnpooledByteBufAllocator(directByDefault: true)],
            array=@byte[][isEmpty=false;size=4096],
            tmpNioBuf=@HeapByteBuffer[java.nio.HeapByteBuffer[pos=4096 lim=4096 cap=4096]],
            REFCNT_FIELD_OFFSET=@Long[32],
            AIF_UPDATER=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@7e2bf17d],
            updater=@[io.netty.buffer.AbstractReferenceCountedByteBuf$1@865b4ec],
            refCnt=@Integer[2],
            logger=@LocationAwareSlf4JLogger[LocationAwareSlf4JLogger(io.netty.buffer.AbstractByteBuf)],
            LEGACY_PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.bytebuf.checkAccessible],
            PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.checkAccessible],
            checkAccessible=@Boolean[true],
            PROP_CHECK_BOUNDS=@String[io.netty.buffer.checkBounds],
            checkBounds=@Boolean[true],
            leakDetector=@ResourceLeakDetector[io.netty.util.ResourceLeakDetector@30a63b98],
            readerIndex=@Integer[0],
            writerIndex=@Integer[4096],
            markedReaderIndex=@Integer[0],
            markedWriterIndex=@Integer[0],
            maxCapacity=@Integer[2147483647],
        ],
        readBufferStartPosition=@Long[0],
        invocationCount=@Long[133],
        cacheHitCount=@Long[680901],
        fileChannel=@FileChannelImpl[
    ],
]
method=org.apache.bookkeeper.bookie.EntryLogger.getFromChannels location=AtExit
ts=2023-07-08 21:05:30; [cost=0.002436ms] result=@ArrayList[
    @Object[][
        @Long[50236],
    ],
    @BufferedReadChannel[
        readCapacity=@Integer[4096],
        readBuffer=@InstrumentedUnpooledUnsafeHeapByteBuf[
            alloc=@UnpooledByteBufAllocator[UnpooledByteBufAllocator(directByDefault: true)],
            array=@byte[][isEmpty=false;size=4096],
            tmpNioBuf=@HeapByteBuffer[java.nio.HeapByteBuffer[pos=0 lim=4096 cap=4096]],
            REFCNT_FIELD_OFFSET=@Long[32],
            AIF_UPDATER=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@7e2bf17d],
            updater=@[io.netty.buffer.AbstractReferenceCountedByteBuf$1@865b4ec],
            refCnt=@Integer[2],
            logger=@LocationAwareSlf4JLogger[LocationAwareSlf4JLogger(io.netty.buffer.AbstractByteBuf)],
            LEGACY_PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.bytebuf.checkAccessible],
            PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.checkAccessible],
            checkAccessible=@Boolean[true],
            PROP_CHECK_BOUNDS=@String[io.netty.buffer.checkBounds],
            checkBounds=@Boolean[true],
            leakDetector=@ResourceLeakDetector[io.netty.util.ResourceLeakDetector@30a63b98],
            readerIndex=@Integer[0],
            writerIndex=@Integer[4096],
            markedReaderIndex=@Integer[0],
            markedWriterIndex=@Integer[0],
            maxCapacity=@Integer[2147483647],
        ],
        readBufferStartPosition=@Long[-1],
        invocationCount=@Long[134],
        cacheHitCount=@Long[680901],
        fileChannel=@FileChannelImpl[
    ],
]
method=org.apache.bookkeeper.bookie.EntryLogger.getFromChannels location=AtExit
ts=2023-07-08 21:07:50; [cost=0.002968ms] result=@ArrayList[
    @Object[][
        @Long[50236],
    ],
    @BufferedReadChannel[
        readCapacity=@Integer[4096],
        readBuffer=@InstrumentedUnpooledUnsafeHeapByteBuf[
            alloc=@UnpooledByteBufAllocator[UnpooledByteBufAllocator(directByDefault: true)],
            array=@byte[][isEmpty=false;size=4096],
            tmpNioBuf=@HeapByteBuffer[java.nio.HeapByteBuffer[pos=0 lim=4096 cap=4096]],
            REFCNT_FIELD_OFFSET=@Long[32],
            AIF_UPDATER=@AtomicIntegerFieldUpdaterImpl[java.util.concurrent.atomic.AtomicIntegerFieldUpdater$AtomicIntegerFieldUpdaterImpl@7e2bf17d],
            updater=@[io.netty.buffer.AbstractReferenceCountedByteBuf$1@865b4ec],
            refCnt=@Integer[2],
            logger=@LocationAwareSlf4JLogger[LocationAwareSlf4JLogger(io.netty.buffer.AbstractByteBuf)],
            LEGACY_PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.bytebuf.checkAccessible],
            PROP_CHECK_ACCESSIBLE=@String[io.netty.buffer.checkAccessible],
            checkAccessible=@Boolean[true],
            PROP_CHECK_BOUNDS=@String[io.netty.buffer.checkBounds],
            checkBounds=@Boolean[true],
            leakDetector=@ResourceLeakDetector[io.netty.util.ResourceLeakDetector@30a63b98],
            readerIndex=@Integer[0],
            writerIndex=@Integer[4096],
            markedReaderIndex=@Integer[0],
            markedWriterIndex=@Integer[0],
            maxCapacity=@Integer[2147483647],
        ],
        readBufferStartPosition=@Long[-1787371633],
        invocationCount=@Long[138],
        cacheHitCount=@Long[711850],
        fileChannel=@FileChannelImpl[
    ],
]

so the wrong data is returned.

still have no idea why the entry log header save -1.

Changes

clean cached read buffer and readPosition when read from BufferedChannel throw exception

lifepuzzlefun avatar Jul 08 '23 15:07 lifepuzzlefun

2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 50236
2023-07-08T22:12:50,481+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - Recovering ledgers maps for log 50236 at offset: -1
2023-07-08T22:12:50,482+0800 [GarbageCollectorThread-19-1] WARN  org.apache.bookkeeper.bookie.EntryLogger - error when extractEntryLogMetadataFromIndex
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataFromIndex(EntryLogger.java:1124) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1085) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

I checkout the code to release-4.15.4, but the code in my local does not match your stack. Did you modify some code in your environment?

horizonzy avatar Jul 16 '23 04:07 horizonzy

@horizonzy hi, this should be the origin stack, the arthas tool may inject some trace. move the line number.

2023-07-08T18:53:36,200+0800 [GarbageCollectorThread-19-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1008) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1168) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1071) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]
2023-07-08T18:53:36,300+0800 [SyncThread-20-1] INFO  org.apache.bookkeeper.bookie.Journal - garbage collected journal 188800d0755.txn

50236

2023-07-08T19:35:03,633+0800 [SyncThread-20-1] INFO  org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog - Synced entry logger 53177 to disk.
2023-07-08T19:35:20,494+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 50236
2023-07-08T19:35:20,494+0800 [GarbageCollectorThread-19-1] INFO  org.apache.bookkeeper.bookie.EntryLogger - Failed to get ledgers map index from: 50236.log : Negative position
2023-07-08T19:35:20,616+0800 [GarbageCollectorThread-19-1] ERROR org.apache.bookkeeper.common.util.SafeRunnable - Unexpected throwable caught
java.lang.IllegalArgumentException: Negative position
	at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:813) ~[?:?]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:93) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:65) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.readFromLogChannel(EntryLogger.java:418) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.scanEntryLog(EntryLogger.java:1008) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.extractEntryLogMetadataByScanning(EntryLogger.java:1168) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.EntryLogger.getEntryLogMetadata(EntryLogger.java:1071) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:758) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:411) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.bookie.GarbageCollectorThread.safeRun(GarbageCollectorThread.java:391) ~[org.apache.bookkeeper-bookkeeper-server-4.15.4.jar:4.15.4]
	at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.15.4.jar:4.15.4]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.86.Final.jar:4.1.86.Final]
	at java.lang.Thread.run(Thread.java:833) ~[?:?]

lifepuzzlefun avatar Jul 16 '23 05:07 lifepuzzlefun

I checked the code, the ledgerMapOffset corresponds to channel position, and the ledgersCount corresponds to ledger count. Both values can't be -1. It's so strange.

Besides this pr, we can override the ledgerMapOffset and ledgersCount values for the log file after scanning the log file.

horizonzy avatar Jul 16 '23 07:07 horizonzy

Hey @eolivelli @hangc0276 @zymap,

Just wanted to let you know that is still seems to be a thing.

We are using: Pulsar 4.0.0 With Apache Bookie 4.17.1 Deployed via https://pulsar.apache.org/charts

Storage was constantly growing on the affected bookie, when triggering compaction manually via: curl -X PUT http://localhost:8000/api/v1/bookie/gc

I've seen same pattern in the logs:

2025-04-16T14:28:10,211+0000 [GarbageCollectorThread-6-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - GarbageCollectorThread-6-1 Set forceGarbageCollection to false after force GC to make it forceGC-able again.
2025-04-16T14:28:10,152+0000 [GarbageCollectorThread-6-1] INFO  org.apache.bookkeeper.bookie.DefaultEntryLogger - Failed to get ledgers map index from: 4503.log : Negative position
2025-04-16T14:28:10,152+0000 [GarbageCollectorThread-6-1] DEBUG org.apache.bookkeeper.bookie.DefaultEntryLogger - Recovering ledgers maps for log 4503 at offset: -1
2025-04-16T14:28:10,151+0000 [GarbageCollectorThread-6-1] INFO  org.apache.bookkeeper.bookie.GarbageCollectorThread - Extracting entry log meta from entryLogId: 4503

when entry log was moved to backup location:

#dec 4503 -> hex 1197
pulsar-bookie-2:~/bookkeeper/ledgers/current$ ls bkp/
1197.log

compaction went fine and reclaimed expected storage.

mrtworo avatar Apr 16 '25 14:04 mrtworo