LibPQ.jl icon indicating copy to clipboard operation
LibPQ.jl copied to clipboard

collect on LibPQ.Column allocates a lot

Open willtebbutt opened this issue 3 years ago • 1 comments
trafficstars

MWE (I've used a connection to a private DB in this issue, so you'll have to construct your own DB in order to run locally):

using
    BenchmarkTools,
    JSON3,
    LibPQ,
    Tables

function get_conn()
    return <implement-connection>
end

function get_data()
    conn = get_conn()
    result = execute(conn, "SELECT s.data::float8 FROM generate_series(0,1000000,0.1) AS s(data);"; binary_format=true)
    data = columntable(result)
    close(conn)
    return data.data
end

function my_getindex(jl_result::LibPQ.Result, row::Integer, col::Integer)
    data_ptr = LibPQ.libpq_c.PQgetvalue(jl_result.result, row - 1, col - 1)
    return convert(Float64, LibPQ.pqparse(Float64, data_ptr))
end

function get_data(c)
    jl_result = c.result
    return map(1:length(c)) do n
        return my_getindex(jl_result, n, 1)
    end
end

lmp_column = get_data()
println("Current LibPQ performance")
display(@benchmark collect($lmp_column))
println()

println("Peformance with hack")
display(@benchmark get_data($lmp_column))
println()

The above yields the following results:

Current LibPQ performance
BenchmarkTools.Trial: 1 sample with 1 evaluation.
 Single result which took 103.319 s (4.50% GC) to evaluate,
 with a memory estimate of 23.19 GiB, over 699999498 allocations.

Peformance with hack
BenchmarkTools.Trial: 5 samples with 1 evaluation.
 Range (min … max):  954.222 ms …   1.141 s  ┊ GC (min … max): 0.14% … 16.14%
 Time  (median):     998.131 ms              ┊ GC (median):    4.66%
 Time  (mean ± σ):      1.034 s ± 89.276 ms  ┊ GC (mean ± σ):  7.67% ±  7.77%

  ██            █                                     █      █  
  ██▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁█ ▁
  954 ms          Histogram: frequency by time          1.14 s <

 Memory estimate: 762.94 MiB, allocs estimate: 6.

Currently, LibPQ generates about 7 allocations per call to getindex (to which collect falls back). It certainly appears to be possible to elminiate essentially all allocations, as shown by the profiling results for get_data(c), which is a quick hack I used to avoid some LibPQ's internals. This change yields roughly a 100x performance improvement.

It looks the construction of a LibPQ.PQValue is at least partially responsible. Evidence:

julia> @benchmark LibPQ.PQValue{$lmp_column.oid}($lmp_column.result, 1, 1)

# BenchmarkTools.Trial: 10000 samples with 214 evaluations.
#  Range (min … max):  347.556 ns …  39.722 μs  ┊ GC (min … max): 0.00% … 98.59%
#  Time  (median):     358.355 ns               ┊ GC (median):    0.00%
#  Time  (mean ± σ):   393.463 ns ± 935.449 ns  ┊ GC (mean ± σ):  6.27% ±  2.61%

#   ▄██▇▆▅▄▄▅▇▇▅▃▂▃▃▂▁▁ ▁▁▁▂▂▂▁▁▁▂▁▁▁                    ▁▂       ▃
#   █████████████████████████████████▇██▇▆▆▃▆▅▅▄▁▃▄▄▄▅▄▆████▆▅▇▇█ █
#   348 ns        Histogram: log(frequency) by time        490 ns <

#  Memory estimate: 176 bytes, allocs estimate: 4.

This doesn't look like it's quite enough to account for all of the allocations though, so presumably there's something else going on as well.

willtebbutt avatar Jul 26 '22 19:07 willtebbutt

One possible approach to solve this would be to switch from the "PQValue" typed wrapper to a typed column wrapper, which would only need to be constructed once per column (and in most cases, when parsing the whole column, the dynamic dispatch would only need to happen once as well). Another might be generating a function at runtime when constructing Result that removes the construction of PQValue and dynamic dispatch within itself.

We might also consider splitting the LibPQ Result into eager and lazy versions, where the eager performs parsing before returning and the lazy is just a very simple wrapper to PGresult and doesn't know anything about types or parsing.

Either way it sounds like we need to move away from using PQValue dispatch as the way we add new parsing methods.

I probably do not have time to implement any fixes but I am happy to facilitate anyone who does have time to learn and implement.

iamed2 avatar Jul 26 '22 21:07 iamed2