pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve][meta] Fix invalid use of drain API and race condition in closing metadata store

Open lhotari opened this issue 1 year ago • 2 comments

Motivation

There's currently some memory leaks in tests and while investigating the issue, I found out that there's a large number of uncompleted CompletableFutures in the heap dump.

Currently the metadata store doesn't complete all pending operations when it is closed. There are multiple problems:

  • MpscUnboundedArrayQueue.drain will limit the number of entries to 4096 at a time
  • MpscUnboundedArrayQueue.drain uses relaxedPoll which is eventually consistent
  • There might be batches in flight, which need to be terminated

Modifications

  • replace drain with a while loop in close
  • replace drain with a for loop in batch flushing
  • add isClosed checks

Additional Context

CompletableFutures in the heap dump:

image

The instances are related to org.apache.pulsar.broker.resources.NamespaceResources$PartitionedTopicResources$$Lambda$1819+0x00007f08a8b65ee8: image

org.apache.pulsar.broker.resources.NamespaceResources$PartitionedTopicResources$$Lambda$1819+0x00007f08a8b65ee8 seems to be this code block: https://github.com/apache/pulsar/blob/bbdc173e80157296ff0475cfdc4e36f5d063a7df/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/resources/NamespaceResources.java#L345-L374

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

lhotari avatar Apr 25 '24 11:04 lhotari

It seems that the OOME is another issue. https://github.com/apache/pulsar/pull/22586

lhotari avatar Apr 25 '24 14:04 lhotari

Codecov Report

Attention: Patch coverage is 75.00000% with 6 lines in your changes are missing coverage. Please review.

Project coverage is 71.02%. Comparing base (bbc6224) to head (edb72fd). Report is 260 commits behind head on master.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #22585      +/-   ##
============================================
- Coverage     73.57%   71.02%   -2.55%     
+ Complexity    32624     5630   -26994     
============================================
  Files          1877     1891      +14     
  Lines        139502   152144   +12642     
  Branches      15299    18133    +2834     
============================================
+ Hits         102638   108061    +5423     
- Misses        28908    35445    +6537     
- Partials       7956     8638     +682     
Flag Coverage Δ
inttests 26.99% <23.52%> (+2.40%) :arrow_up:
systests 24.62% <62.50%> (+0.29%) :arrow_up:
unittests 72.20% <100.00%> (-0.64%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...he/pulsar/metadata/impl/AbstractMetadataStore.java 85.29% <ø> (+0.73%) :arrow_up:
...ta/impl/batching/AbstractBatchedMetadataStore.java 84.67% <75.00%> (-11.38%) :arrow_down:

... and 363 files with indirect coverage changes

codecov-commenter avatar May 13 '24 01:05 codecov-commenter