Add observability metrics for CommandPartitionedTopicMetadata requests
Search before asking
- [X] I searched in the issues and found nothing similar.
Motivation
Currently, there's no way to track CommandPartitionedTopicMetadata requests. There's no metrics or logs that indicate that a broker is handling CommandPartitionedTopicMetadata requests.
Misconfigured clients might flood brokers with CommandPartitionedTopicMetadata requests and cause high CPU consumption.
One example of this is misconfiguration of splunk-otel-collector's Pulsar exporter. The example config configures pulsar-client-go's PartitionsAutoDiscoveryInterval setting to 1 nanosecond. I have sent a PR to fix the example config with https://github.com/signalfx/splunk-otel-collector/pull/2185 . This example shows that it's easy to mix the units and misconfigure a Pulsar client.
Solution
Add observability metrics for CommandPartitionedTopicMetadata requests, similar to what there is for lookup requests added by #8272.
Alternatives
No response
Anything else?
No response
Are you willing to submit a PR?
- [ ] I'm willing to submit a PR!
currently, we have metadata store metrics, if it could meet your needs, I'd like to handle the issue. @lhotari
currently, we have metadata store metrics, if it could meet your needs, I'd like to handle the issue. @lhotari
How are metadata store metrics used currently? I think it could be a breaking change if CommandPartitionedTopicMetadata requests are tracked as part of some other metric. I think it should be a new metric that is unique for CommandPartitionedTopicMetadata requests. @codelipenghui do you have a suggestion?
How are metadata store metrics used currently? I think it could be a breaking change if CommandPartitionedTopicMetadata requests are tracked as part of some other metric. I think it should be a new metric that is unique for CommandPartitionedTopicMetadata requests. @codelipenghui do you have a suggestion?
The metadata store metrics are on the metadata store level which can provide the metastore operation latency. The REST API request metrics should be a separate part. The CommandPartitionedTopicMetadata requests metrics should not 100% equal to the metadata store operation. Maybe the jetty thread is blocked somewhere.
I think maybe jetty already provides the ability to expose the metrics with the request path label?
@codelipenghui @lhotari There are 2 ways to get PartitionedTopicMetadata, one is ServerCnx#handlePartitionMetadataRequest(CommandPartitionedTopicMetadata partitionMetadata), another one is PersistentTopics#getPartitionedMetadata(Args ...)
if we need to add metrics for them, please assign the issue to me
How are metadata store metrics used currently? I think it could be a breaking change if CommandPartitionedTopicMetadata requests are tracked as part of some other metric. I think it should be a new metric that is unique for CommandPartitionedTopicMetadata requests. @codelipenghui do you have a suggestion?
The metadata store metrics are on the metadata store level which can provide the metastore operation latency. The REST API request metrics should be a separate part. The CommandPartitionedTopicMetadata requests metrics should not 100% equal to the metadata store operation. Maybe the jetty thread is blocked somewhere.
I think maybe jetty already provides the ability to expose the metrics with the request path label?
I've checked jetty, seems there is no such ability.
if we want the ability, it's not easy. because we need to converge the request path. such as: /api/v2/persistent/myTenant/myNamespace/partitioned -> /api/v2/persistent/{tenant}/{namespace}/partitioned. it may takes some time
@lhotari @codelipenghui PTAL https://github.com/apache/pulsar/pull/18281
The PIP discuss thread: https://lists.apache.org/thread/sybl4nno4503w42hzt7b5lsyk6m2rbo6
The issue had no activity for 30 days, mark with Stale label.