StructIO.jl icon indicating copy to clipboard operation
StructIO.jl copied to clipboard

Performance issues when reading multiple packed structures

Open evetion opened this issue 7 years ago • 2 comments

I've encountered this in replacing manually written out read and write methods with StructIO in visr/LasIO.jl#10. Although using StructIO is elegant, it's also slower.

I've put up a gist here: https://gist.github.com/evetion/2b57d6105cca39b2d3c6ef670a5cc393 with the following results for reading a thousand TwoUInt64s.

➜ julia performance.jl

Using StructIO:
  4.143 ms (2000 allocations: 62.50 KiB)

Using read_generic_array:
  6.138 ms (12000 allocations: 406.25 KiB)

Using read_generic_tuple:
  480.796 μs (8000 allocations: 265.63 KiB)

Using read_written_out:
  29.229 μs (2000 allocations: 31.25 KiB)

Using generated_read:
  29.704 μs (2000 allocations: 31.25 KiB)

The handwritten read version, which also can be generated, is ~200 times faster.

I know this is an unfair comparison, as StructIO has much more functionality than these simple read functions, but it seems it could be faster, especially if you look at the allocations, which are on par.

Let me know if you can't duplicate these results, or if I'm missing a StructIO method for reading multiple packed structures.

evetion avatar Dec 30 '17 13:12 evetion

I updated and reduced the gist above a little:

using BenchmarkTools
using StructIO

const io = IOBuffer(zeros(UInt8, 16*1000));
abstract type TwoUInt64s end

@io struct TwoUInt64sDefault <: TwoUInt64s
    x::UInt64
    y::UInt64
end align_default

@io struct TwoUInt64sPacked <: TwoUInt64s
    x::UInt64
    y::UInt64
end align_packed

function read_written_out(io::IOBuffer, t::Type{<:TwoUInt64s})
    x = read(io, UInt64)
    y = read(io, UInt64)
    t(x, y)
end

println("Using read_written_out:")
@btime read_written_out($io, TwoUInt64sPacked) setup=seekstart($io)

println("Using StructIO Default:")
@btime unpack($io, TwoUInt64sDefault) setup=seekstart($io)

println("Using StructIO Packed:")
@btime unpack($io, TwoUInt64sPacked) setup=seekstart($io)

Which gives:

Using read_written_out:
  6.158 ns (0 allocations: 0 bytes)
Using StructIO Default:
  24.222 ns (1 allocation: 32 bytes)
Using StructIO Packed:
  1.273 μs (5 allocations: 96 bytes)

StructIO allocates here on every unpack, causing it to be slower, especially if we mark it as packed. In this example struct there is no padding, so perhaps an idea is to fall back to the faster Default unpack if StructIO.packed_sizeof(TwoUInt64s) === sizeof(TwoUInt64s).

I'm not sure if the current general unpack method can be sped up easily (no allocation). Another way to go would be to have @io struct T not only define a packing_strategy(::Type{T}) but a complete unpack(::Type{T}), similar to @evetion's generated_read above?

visr avatar Jan 16 '19 16:01 visr

Couldn't unpack take a pre-allocated buffer parameter if needed, for use when it is called repeatedly?

And/or maybe you could have an unpack(io, T, n) method that reads up to n elements of type T into an array, so to that it can do the allocations only once.

stevengj avatar Jan 08 '23 13:01 stevengj