pulsar icon indicating copy to clipboard operation
pulsar copied to clipboard

[feat][admin] PIP-415: Support getting message ID by index

Open liangyepianzhou opened this issue 7 months ago • 5 comments

PIP:https://github.com/apache/pulsar/pull/24220

Motivation

we can now obtain the offset of a message by its message id:

  1. Get the message by id using get-message-by-id cmd
  2. Get the index of the message using Message.getIndex()

But we cannot obtain the message id by offset. Then we need to add a new API to get the message id by offset.

Modifications

Add a new http API to retrieve the message ID by offset. We propose to add a new API to retrieve the message ID by offset, enabling us to cache the mapping between message ID and offset. This will allow us to use offsets for seek and acknowledgment operations when consuming messages through the standardized API.

Verifying this change

  • [ ] Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • [ ] Dependencies (add or upgrade a dependency)
  • [ ] The public API
  • [ ] The schema
  • [ ] The default values of configurations
  • [ ] The threading model
  • [ ] The binary protocol
  • [ ] The REST endpoints
  • [ ] The admin CLI options
  • [ ] The metrics
  • [ ] Anything that affects deployment

Documentation

  • [ ] doc
  • [ ] doc-required
  • [x] doc-not-needed
  • [ ] doc-complete

Matching PR in forked repository

PR in forked repository:

liangyepianzhou avatar Apr 27 '25 12:04 liangyepianzhou

Good jobs

AuroraTwinkle avatar May 29 '25 08:05 AuroraTwinkle

Out of the scope of this PR, I think it would be better to add a method to ManagedLedger for it. Because the customized ML implementation (e.g. StreamNative's Ursa engine) might not depend on the broker entry metadata to get the message's index. It could be another PIP.

BewareMyPower avatar May 30 '25 10:05 BewareMyPower

Out of the scope of this PR, I think it would be better to add a method to ManagedLedger for it. Because the customized ML implementation (e.g. StreamNative's Ursa engine) might not depend on the broker entry metadata to get the message's index. It could be another PIP.

Yes, we should add a new manager ledger interface to support customized ML implementations.

Additionally, the current message index retrieval solution has suboptimal performance. We could improve this by adding a field in LedgerInfo to record the first entry's index, then compute subsequent indexes directly from message IDs.

I've created an issue to track this enhancement.

liangyepianzhou avatar Jun 03 '25 03:06 liangyepianzhou

Overall LGTM, but the current impl will scan all entire ML, maybe you can optimize it via PIP-404

dao-jun avatar Jun 03 '25 03:06 dao-jun

Overall LGTM, but the current impl will scan all entire ML, maybe you can optimize it via PIP-404

Interesting—I originally proposed directly adding the first entry's index in LedgerInfo to optimize this issue. While the properties field you added could also serve the same purpose, it currently doesn’t record the first entry's index. We can address this optimization in a future PIP, as mentioned in issue.

liangyepianzhou avatar Jun 03 '25 03:06 liangyepianzhou

Codecov Report

Attention: Patch coverage is 72.38095% with 29 lines in your changes missing coverage. Please review.

Project coverage is 74.33%. Comparing base (bbc6224) to head (fc2420f). Report is 1160 commits behind head on master.

Files with missing lines Patch % Lines
...pulsar/broker/admin/impl/PersistentTopicsBase.java 60.81% 19 Missing and 10 partials :warning:
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #24222      +/-   ##
============================================
+ Coverage     73.57%   74.33%   +0.76%     
- Complexity    32624    32765     +141     
============================================
  Files          1877     1868       -9     
  Lines        139502   145497    +5995     
  Branches      15299    16643    +1344     
============================================
+ Hits         102638   108157    +5519     
+ Misses        28908    28789     -119     
- Partials       7956     8551     +595     
Flag Coverage Δ
inttests 26.81% <0.00%> (+2.22%) :arrow_up:
systests 23.33% <0.00%> (-1.00%) :arrow_down:
unittests 73.82% <72.38%> (+0.97%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...pache/pulsar/broker/admin/v2/PersistentTopics.java 88.62% <100.00%> (+7.68%) :arrow_up:
...in/java/org/apache/pulsar/client/admin/Topics.java 80.95% <ø> (+3.45%) :arrow_up:
...pache/pulsar/client/admin/internal/TopicsImpl.java 84.38% <100.00%> (+2.37%) :arrow_up:
...in/java/org/apache/pulsar/admin/cli/CmdTopics.java 80.63% <100.00%> (-0.55%) :arrow_down:
...pulsar/broker/admin/impl/PersistentTopicsBase.java 70.02% <60.81%> (+4.57%) :arrow_up:

... and 1085 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov-commenter avatar Jun 23 '25 04:06 codecov-commenter

@liangyepianzhou https://lists.apache.org/thread/j9vxmmsw1r85l27v3rgp9tcnllk48wyb Please add two labels, and cherry-pick PIP-415 into branch-4.0 and branch-4.1.

StevenLuMT avatar Nov 09 '25 23:11 StevenLuMT