avro_turf icon indicating copy to clipboard operation
avro_turf copied to clipboard

a way to improve encoding performance for messages batches when using AvroTurf::Messaging

Open Diyaa1 opened this issue 1 year ago • 3 comments

Hello there,

We have this case where we are encoding over 1000 messages at a time, but I noticed a bottleneck where AvroTurf takes over 60% of the CPU time, so I profiled the code and found out that the schema is being parsed for every message as noticed in below code, is this intentional? is there anyway to improve it ?

note: even in fetch_schema_by_id the method @schemas_by_id is not being set anywhere.

appreciate any help thank you :)

    # Providing subject and version to determine the schema,
    # which skips the auto registeration of schema on the schema registry.
    # Fetch the schema from registry with the provided subject name and version.
    def fetch_schema(subject:, version: 'latest')
      schema_data = @registry.subject_version(subject, version)
      schema_id = schema_data.fetch('id')
      schema = Avro::Schema.parse(schema_data.fetch('schema'))
      [schema, schema_id]
    end
    # Fetch the schema from registry with the provided schema_id.
    def fetch_schema_by_id(schema_id)
      schema = @schemas_by_id.fetch(schema_id) do
        schema_json = @registry.fetch(schema_id)
        Avro::Schema.parse(schema_json)
      end
      [schema, schema_id]
    end

Diyaa1 avatar Feb 18 '24 10:02 Diyaa1

Ah, I think the bug is caused by the incorrect assumption that fetch actually sets the value to the result of the block; it does not.

Can you do a PR?

I think the solution is to do this instead:

      schema = @schemas_by_id[schema_id] ||= begin
        schema_json = @registry.fetch(schema_id)
        Avro::Schema.parse(schema_json)
      end

This will actually assign the computed value to the hash while retaining the laziness.

dasch avatar Feb 19 '24 12:02 dasch

Thanks @dasch, I moved with patching avro_turf using below in production.

I can do a PR, are below changes acceptable ? I'll do it when I get free time.

class AvroTurf
  class Messaging
    def fetch_schema(subject:, version: 'latest')
      schema_data = @registry.subject_version(subject, version)
      schema_id = schema_data.fetch('id')
      schema = @schemas_by_id[schema_id] ||= begin
        Avro::Schema.parse(schema_data.fetch('schema'))
      end
      [schema, schema_id]
    end

    def fetch_schema_by_id(schema_id)
      schema = @schemas_by_id[schema_id] ||= begin
        schema_json = @registry.fetch(schema_id)
        Avro::Schema.parse(schema_json)
      end
      [schema, schema_id]
    end
  end
end

Diyaa1 avatar Feb 20 '24 08:02 Diyaa1

Yeah, that looks good :+1:

dasch avatar Feb 26 '24 14:02 dasch