pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[improve][broker] Supplement schema ledger if schema ledger is lost

Open Denovo1998 opened this issue 2 years ago • 5 comments

Fixes #20414

Master Issue: #https://github.com/apache/pulsar/issues/20414

Motivation

https://github.com/apache/pulsar/issues/17221 describes an environment when multiple bookie copies are corrupted, or a Ledger has been deleted. The loss of schema ledger results in new producers and consumers not even being created and working properly.

So we need a solution that does not just skip the schema with the missing ledger, but actually supplements the broken schema ledger.

Modifications

Add a new method tryCompleteTheLostSchema() in SchemaStorage and SchemaRegistry

CompletableFuture<Long> tryCompleteTheLostSchemaLedger(String key, SchemaVersion version, SchemaData schema);
  1. get schemalocator from metastore
  2. Create a new ledger. And write SchemaStorageFormat.SchemaEntry built with schemaData and schemaVersion.
  3. update schemalocator to metastore(new ledger id)

Verifying this change

  • [x] Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository: https://github.com/Denovo1998/pulsar/pull/4

Denovo1998 avatar May 27 '23 06:05 Denovo1998

@poorbarcode @congbobo184 @codelipenghui SchemaData and SchemaVersion has moved to the org.apache.pulsar.broker.service.AbstractTopic, rather than save in each producer and consumer. Check out the solution in issue #20414. Is this way okay now?

Denovo1998 avatar Jun 05 '23 05:06 Denovo1998

The pr had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Jul 30 '23 01:07 github-actions[bot]

Waiting to discuss whether this plan is feasible. I will send an email to discuss it later.

Denovo1998 avatar Aug 29 '23 13:08 Denovo1998

The pr had no activity for 30 days, mark with Stale label.

github-actions[bot] avatar Sep 29 '23 01:09 github-actions[bot]

In the alternative, the implementation is updated. Needs to be discussed.

Denovo1998 avatar Jan 06 '24 09:01 Denovo1998