pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[fix] [broker] local metadata sync topic contains configuration events causing all operations stuck

Open poorbarcode opened this issue 1 year ago • 3 comments

Motivation

Background:

  • PIP-136: Sync Pulsar policies across multiple clouds defines two topics below:

    • metadataSyncEventTopic: monitors local metadata store changes
    • configurationMetadataSyncEventTopic : monitors local metadata store changes
  • Local metadata store and Configuration metadata store share the same object in memory when their URLs are the same.

Issue 1

Since the event synchronizer is bound to the metadata store object in memory, the synchronizer receives all the events about the Local metadata store and Configuration metadata store when the two metadata stores are the same object in memory, the data in the two topics got mixed up.

Issue 2 The internal producer of the synchronizer relies on the SyncEventTopic; this topic relies on the namespace local policies; the operation of writing namespace local policies to ZK relies on the internal producer. A deadlock occurs. See the following flow:

  • Try to start the internal producer of the synchronizer.
  • Try to load the topic named metadataSyncEventTopic up.
  • Try to write data to the Local Metadata Store.
  • Try to send events to metadataSyncEventTopic before writing data to the Local Metadata Store.
  • The internal producer is starting now.
  • Stuck.....

You can reproduce this issue by the test SyncConfigStore1ZKPerClusterTest. testDynamicEnableConfigurationMetadataSyncEventTopic. This PR fixed the issue that the synchronizer got stuck due to two metadata stores relying on it. I will write a separate PR that skips syncing data that relies on the synchronizer itself.

Modifications

  • Create a separate configuration metadata store if users want to enable Metadata Synchronizer, even if the URL of the configuration metadata store is the same as the local metadata store.
  • Correct the behavior: metadataSyncEventTopic only receives the event about the local metadata store and configurationMetadataSyncEventTopic only receives the event about the configuration metadata store.
  • If the Broker has initialized itself with one metadata store, reject the dynamic config changes.
  • Add an optional choice mayEnableMetadataSynchronizer to let the Broker initialize itself with a separate configuration metadata store.

Next PRs

Skip to sync events that rely on the synchronizer itself. For example:

  • metadataSyncEventTopic is public/default/tp
  • (Highlight) Do not sync the events related to the topic public/default/tp, because the synchronizer relies on this topic 😂 , I will send a discussion for this change. See more details the Issue 2 in the section Motivation.

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: x

poorbarcode avatar May 10 '24 19:05 poorbarcode

@poorbarcode Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

github-actions[bot] avatar May 10 '24 19:05 github-actions[bot]

@poorbarcode please resolve the merge conflict

lhotari avatar Jul 29 '24 16:07 lhotari

@lhotari

@poorbarcode please resolve the merge conflict

Done

poorbarcode avatar Jul 29 '24 16:07 poorbarcode