avro_turf
avro_turf copied to clipboard
a way to improve encoding performance for messages batches when using AvroTurf::Messaging
Hello there,
We have this case where we are encoding over 1000 messages at a time, but I noticed a bottleneck where AvroTurf takes over 60% of the CPU time, so I profiled the code and found out that the schema is being parsed for every message as noticed in below code, is this intentional? is there anyway to improve it ?
note: even in fetch_schema_by_id the method @schemas_by_id is not being set anywhere.
appreciate any help thank you :)
# Providing subject and version to determine the schema,
# which skips the auto registeration of schema on the schema registry.
# Fetch the schema from registry with the provided subject name and version.
def fetch_schema(subject:, version: 'latest')
schema_data = @registry.subject_version(subject, version)
schema_id = schema_data.fetch('id')
schema = Avro::Schema.parse(schema_data.fetch('schema'))
[schema, schema_id]
end
# Fetch the schema from registry with the provided schema_id.
def fetch_schema_by_id(schema_id)
schema = @schemas_by_id.fetch(schema_id) do
schema_json = @registry.fetch(schema_id)
Avro::Schema.parse(schema_json)
end
[schema, schema_id]
end
Ah, I think the bug is caused by the incorrect assumption that fetch
actually sets the value to the result of the block; it does not.
Can you do a PR?
I think the solution is to do this instead:
schema = @schemas_by_id[schema_id] ||= begin
schema_json = @registry.fetch(schema_id)
Avro::Schema.parse(schema_json)
end
This will actually assign the computed value to the hash while retaining the laziness.
Thanks @dasch, I moved with patching avro_turf using below in production.
I can do a PR, are below changes acceptable ? I'll do it when I get free time.
class AvroTurf
class Messaging
def fetch_schema(subject:, version: 'latest')
schema_data = @registry.subject_version(subject, version)
schema_id = schema_data.fetch('id')
schema = @schemas_by_id[schema_id] ||= begin
Avro::Schema.parse(schema_data.fetch('schema'))
end
[schema, schema_id]
end
def fetch_schema_by_id(schema_id)
schema = @schemas_by_id[schema_id] ||= begin
schema_json = @registry.fetch(schema_id)
Avro::Schema.parse(schema_json)
end
[schema, schema_id]
end
end
end
Yeah, that looks good :+1: