ProtoBuf.jl icon indicating copy to clipboard operation
ProtoBuf.jl copied to clipboard

Deserialization is extremely slow for messages with many small sub-messages.

Open lassepe opened this issue 3 years ago • 12 comments

First of all, thank you for making ProtoBuf available in Julia! I am currently using this package to read scenarios form the waymo-open-dataset. While this has worked reliably, I am running into quite some performance issues due the "topology" of the messages.

Some relevant details:

Each of the scenarios consists of approximately 2MB of binary data. However, each scenario contains map features which in turn consist of some meta-data and a number of MapPoints. Each map point only encodes three floats for the x, y, and z position. Thus they are quite small. However, a single scenario can contain > 100.000 MapPoints (spread over the different features of the map).

As a result, loading a single scenario takes about 150ms on my machine. This may not sound all too bad but in my current usecase I need to iterate over about 400.000 of these scenarios; potentially multiple times.

After doing some profiling, I found that 60% of the time is spent pre-allocating memory for all the small dicts that are needed for each MapPoint object. Specifically, most of the time is spent in this line of julia base.

Just to provide some comparison: The same kind of data is loaded in about 11ms from a json file, and in about 3ms from the binary format created by Serialization.serialize. Of course, there are other issues with these kinds of formats. Therefore, I would really like to use ProtoBuf here. But I cannot get it up to speed.

Are there known ways to reduce the amount of overhead that arises in this setting with many small proto sub-messages? Or is this issue inherent to the current design of the library?

lassepe avatar May 02 '21 20:05 lassepe

From what I understand, this looks like the Dict used to store members of a protobuf struct? E.g. this? Would you be able to check/put up a more detailed profiler stack to confirm this?

In that case, maybe something more lightweight than a Dict will help here. The count and names of fields is known during code generation, so one way could be to generate specialized code for each struct instead of using Dict.

tanmaykm avatar May 27 '21 02:05 tanmaykm

Yes, the bottleneck is due to the dict that is used to represent the protobuf stuct. So there is not really a quick-fix for this because it would require a fundamentally different way of generating the code.

I created a MWE that also includes some results in the profile_results subdirectory. I'm not sure what else would be a good way to share more information on this because the flame graph is not really helpful unless you interact with it (i.e. hover etc.). I think you can ignore the large chunk on the bottom right in this chart because it is just the overhead that comes from running things in the REPL. The actual workload is the part on the left.

image

lassepe avatar May 27 '21 07:05 lassepe

Hi guys, wondering what the patterns in protobuf.jl is to deserialise multiple sub messages in the one message? (not sure if this i should open a separate issue)

I'm seeing that this is not officially part of the protobuf spec https://developers.google.com/protocol-buffers/docs/techniques and couldn't see it in any of the Readme's.

hs-ye avatar Aug 08 '21 23:08 hs-ye

We have the same issue with OpenStreetMapX: https://github.com/pszufe/OpenStreetMapX.jl/pull/44#discussion_r667459615

The count and names of fields is known during code generation, so one way could be to generate specialized code for each struct instead of using Dict.

Is there any technical issue blocking the code of the struct to be generated without using Dict but fields instead?

blegat avatar Aug 11 '21 20:08 blegat

Technically it should be possible. The point is just that one would have to re-write the entire code-generation logic.

lassepe avatar Aug 12 '21 05:08 lassepe

The point is just that one would have to re-write the entire code-generation logic.

Indeed. It's best to try it out by manually doing it for one package. I tried it by manually changing the generated code of OpenStreetMapX, see here for the proof of concept: https://github.com/pszufe/OpenStreetMapX.jl/pull/52 It was quite simple since the PBF specification of OpenStreetMap does not use oneof but it should be doable for oneof as well.

blegat avatar Aug 27 '21 09:08 blegat

Hi all, we're seeing the same thing here: allocating a small Dict per message is prohibitively expensive.

We remembered that the last time we used ProtoBuf.jl, a few years ago (for JuliaPerf/PProf.jl@d6d525c), it generated a struct per proto message, instead of an object containing a Dict.

We found that this change from struct to Dict was made as part of https://github.com/juliaio/protobuf.jl/pull/141, which fixed a thread-safety issue regarding the use of meta. It seems that that PR also changed to a Dict representation, but the PR description doesn't contain any explanation about why we made that change.

@tanmaykm can you elaborate on how we came to the current design with a Dict in each object? Why did we change away from generating a struct per message? We'd be interested to help with implementing any future work here to improve the performance, and we'd like to help get rid of this allocation. 😊

Thanks so much! :)

CC: @dewilson

NHDaly avatar Jan 10 '22 19:01 NHDaly

For this example proto, here's what it looks like before and after #141 Proto:

syntax = "proto3";

message plant {
    string name = 3;
}

message seed {
    int32 cold_hours = 1;
    int32 germination_hours = 2;
    plant species = 3;
}

Here's the seed proto after #141:

mutable struct seed <: ProtoType
    __protobuf_jl_internal_meta::ProtoMeta
    __protobuf_jl_internal_values::Dict{Symbol,Any}

    function seed(; kwargs...)
        obj = new(meta(seed), Dict{Symbol,Any}())
        values = obj.__protobuf_jl_internal_values
        symdict = obj.__protobuf_jl_internal_meta.symdict
        for nv in kwargs
            fldname, fldval = nv
            fldtype = symdict[fldname].jtyp
            (fldname in keys(symdict)) || error(string(typeof(obj), " has no field with name ", fldname))
            values[fldname] = isa(fldval, fldtype) ? fldval : convert(fldtype, fldval)
        end
        obj
    end
end # mutable struct seed
const __meta_seed = Ref{ProtoMeta}()
function meta(::Type{seed})
    ProtoBuf.metalock() do
        if !isassigned(__meta_seed)
            __meta_seed[] = target = ProtoMeta(seed)
            allflds = Pair{Symbol,Union{Type,String}}[:cold_hours => Int32, :germination_hours => Int32, :species => plant]
            meta(target, seed, allflds, ProtoBuf.DEF_REQ, ProtoBuf.DEF_FNUM, ProtoBuf.DEF_VAL, ProtoBuf.DEF_PACK, ProtoBuf.DEF_WTYPES, ProtoBuf.DEF_ONEOFS, ProtoBuf.DEF_ONEOF_NAMES)
        end
        __meta_seed[]
    end
end
function Base.getproperty(obj::seed, name::Symbol)
    if name === :cold_hours
        return (obj.__protobuf_jl_internal_values[name])::Int32
    elseif name === :germination_hours
        return (obj.__protobuf_jl_internal_values[name])::Int32
    elseif name === :species
        return (obj.__protobuf_jl_internal_values[name])::plant
    else
        getfield(obj, name)
    end
end

And before:

mutable struct plant <: ProtoType
    name::AbstractString
    plant(; kwargs...) = (o=new(); fillunset(o); isempty(kwargs) || ProtoBuf._protobuild(o, kwargs); o)
end #mutable struct plant
const __fnum_plant = Int[3]
meta(t::Type{plant}) = meta(t, ProtoBuf.DEF_REQ, __fnum_plant, ProtoBuf.DEF_VAL, true, ProtoBuf.DEF_PACK, ProtoBuf.DEF_WTYPES, ProtoBuf.DEF_ONEOFS, ProtoBuf.DEF_ONEOF_NAMES, ProtoBuf.DEF_FIELD_TYPES)

mutable struct seed <: ProtoType
    cold_hours::Int32
    germination_hours::Int32
    species::plant
    seed(; kwargs...) = (o=new(); fillunset(o); isempty(kwargs) || ProtoBuf._protobuild(o, kwargs); o)
end #mutable struct seed

dewilson avatar Jan 10 '22 19:01 dewilson

:) We're wondering the same thing! I think it probably is related to trying to support protobuf reflection in a good way, or may have been related to supporting mutually recursive message definitions. Whatever the reason, we'd be happy to help with some group engineering to get to a solution we all love! 😊

NHDaly avatar Jan 10 '22 20:01 NHDaly

I was hoping that some sort of lightweight dict could be used here, but that has not happened yet unfortunately. @NHDaly has rightly pointed out reasons for switching to a dict based representation.

  • avoided the earlier global dict used to store what attributes are set in each instance of a struct, this is needed to be able to serve defaults and set/unset members
  • no need to mangle struct member names that are not Julia compatible (e.g. are keywords and such)
  • recursive struct definitions easier to handle

It will sure be great to have folks contribute to this!

tanmaykm avatar Jan 11 '22 03:01 tanmaykm

Thanks for the explanation, @tanmaykm! :) Makes sense.

Okay, I think we should be able to achieve those goals without having dictionaries in the structs. The proto implementations in other languages have all found ways to work around these, so I think we should be able to do so as well.

For the recursive struct definitions one: I think the standard way to do mutually recursive struct definitions is to introduce a type parameter for one of them.

So for example, given this proto definition:

message RecurseA {
    required string a = 1;
    optional RecurseB b = 2;
}

message RecurseB {
    required string b = 1;
    optional RecurseA a = 2;
}

You could generate julia code like this:

struct _RecurseA{__RecurseB}
    a::AbstractString
    b::__RecurseB
end
struct RecurseB
    b::AbstractString
    a::_RecurseA{RecurseB}
end
const RecurseA = _RecurseA{RecurseB}

I think we can work through the remaining issues, and go back to doing code generation while still avoiding the thread safety and recursion issues you had before! :)

Thanks Tanmay.

NHDaly avatar Jan 11 '22 16:01 NHDaly

For the field name issue, we now have this syntax:

struct foo
    var"invalid field name!!"::Int32
end

JeffBezanson avatar Jan 20 '22 22:01 JeffBezanson

Hey folks, I encourage you to take the new version (1.0) of this package for a spin, should be a bit faster. Please see the docs for some guidance on how to migrate to this new version. Closing for now as this issue is a little all over to place, but please, feel free to reopen (or better, start a new issue) if you feel the performance is still bad.

Drvi avatar Aug 18 '22 09:08 Drvi

FWIW, I returned to this recently to re-assess the feasibility of using ProtoBuf for another project. The MWE from above now only takes about 4ms on my machine --- a solid 30x improvement.

Thanks, @Drvi and team!

lassepe avatar Jan 22 '24 13:01 lassepe