confluent-kafka-python
confluent-kafka-python copied to clipboard
JSONSerializer unnecessary schema validation on every call
Description
Schema validation is coupled with object validation in JSONSerializer.call(). Every time an object is being serialized, the schema (JSONSerializer._parsed_schema) is being validated alongside the object dict validation: https://github.com/confluentinc/confluent-kafka-python/blob/baf71ea0ed54c71948208bfc5c352f4ee57054dd/src/confluent_kafka/schema_registry/json_schema.py#L267
This is due to use of jsonschema.validators.validate as the validation method, which validates schema before validating the object every time:
if cls is None:
cls = validator_for(schema) # Determines the best validator
cls.check_schema(schema) # Uses MetaSchema of the validator to validate the schema
validator = cls(schema, *args, **kwargs) # Initializes new validator
error = exceptions.best_match(validator.iter_errors(instance)) # Validates the object
As a result, JSON serialization is slow and unusable in the current state.
JSONDeserializer suffers from exactly the same problem.
How to reproduce
- Create an instance of JSONSerializer with any valid parameters
- Call it to serialize 10,000 random objects
def test():
ctx = SerializationContext(topic=TOPIC, field=MessageField.VALUE)
for _ in range(10_000):
obj = DummyObject.random_obj()
json_serializer(obj, ctx)
Checklist
Please provide the following information:
- [X] confluent-kafka-python and librdkafka version (
confluent_kafka.version()andconfluent_kafka.libversion()): version: ('2.1.0', 33619968); libversion: ('2.1.0', 33620223) - [ ] Apache Kafka broker version: NA
- [ ] Client configuration:
{...}NA - [X] Operating system: Win10-x64
- [ ] Provide client logs (with
'debug': '..'as necessary) NA - [ ] Provide broker log excerpts NA
- [ ] Critical issue Not critical