schema-registry icon indicating copy to clipboard operation
schema-registry copied to clipboard

Certain JMX metrics fail to report accurate values

Open siobhansabino opened this issue 4 years ago • 8 comments

Note: We are running the latest Schema Registry version, using the Confluent Docker image.

Following the documentation provided to obtain JMX metrics, we have found the following metrics only report zero regardless of reality:

  • bean: kafka.schema.registry:type=jersey-metrics
    • compatibility.subjects.versions.verify.request-rate
    • compatibility.subjects.versions.verify.request-error-rate
    • compatibility.subjects.versions.verify.response-rate

However we do have the following metrics reporting values as expected:

  • bean: kafka.schema.registry:type=master-slave-role
    • master-slave-role
  • bean: kafka.schema.registry:type=jetty-metrics
    • connections-active
  • bean: kafka.schema.registry:type=jersey-metrics
    • subjects.versions.register.request-rate
    • subjects.versions.register.response-rate

(We cannot be certain as to the state of subjects.versions.register.request-error-rate thus have not included it on the list.)

We caught this inaccuracy due to clients interacting with the Schema Registry receiving rejections on compatibility checks which the metrics does not even show requests or responses for.

Please advise if we can provide any further information about this issue.

siobhansabino avatar Mar 09 '20 14:03 siobhansabino

I don't think clients actually hit /compatibility endpoint during registration

You'd have to expect some subset of those exceptions are coming from the /subjects/{name}/versions endpoint, which validates the request payload internally using the same methods behind /compatibility

OneCricketeer avatar Mar 13 '20 13:03 OneCricketeer

This is not for during registration, this is for normal producing where the schema's compatibility is checked before any message is sent, so we'd expect the overwhelming of calls to the Schema Registry to be compatibility. As noted, we see the clients noting their requests and responses to the Schema Registry around this endpoint but the Schema Registry itself does not report any of that.

siobhansabino avatar Mar 13 '20 13:03 siobhansabino

this is for normal producing where the schema's compatibility is checked before any message is sent

Can you please point me to the serializer line that calls the compatibility method? I'm only seeing register :confused:

https://github.com/confluentinc/schema-registry/blob/master/schema-serializer/src/main/java/io/confluent/kafka/serializers/AbstractKafkaSchemaSerDe.java

OneCricketeer avatar Mar 13 '20 13:03 OneCricketeer

We do not have any JVM producers so instead directly call the Schema Registry compatibility API ourselves.

siobhansabino avatar Mar 13 '20 13:03 siobhansabino

I see. Sorry, should have clarified "normal producing" since no other Confluent provided serializer calls the compatibility endpoint, either, to my knowledge.

Regarding the metrics. They are setup all the same way using annotations, so they should work

https://github.com/confluentinc/schema-registry/blob/master/core/src/main/java/io/confluent/kafka/schemaregistry/rest/resources/CompatibilityResource.java#L83

OneCricketeer avatar Mar 13 '20 14:03 OneCricketeer

@siobhansabino Did you find a fix for the same? We are on confluentinc/cp-schema-registry:6.2.0 docker image and facing the same issue.

yesemsanthoshkumar avatar May 05 '22 12:05 yesemsanthoshkumar

If you manually call the /compatibility API endpoints, do they remain zero in the metrics?

OneCricketeer avatar May 05 '22 15:05 OneCricketeer

@OneCricketeer We don't have any custom producers yet. Hence, not sure on the compatibility API.

But we do have debezium and hudi interacting with schema registry. Even when onboarding new tables in debezium (schema is registered to schema registry here) and Hudi jobs (schema is read from schema registry here), the request error rate remains zero all the time. And we do see failures 5xx responses while interacting with schema registry in our job logs. The same is not being reflected in our JMX metrics for Schema registry. Even the 2xx responses are not being reflected in the JMX metrics. This behaviour is consistent among all the endpoint metrics.

yesemsanthoshkumar avatar May 07 '22 06:05 yesemsanthoshkumar