LibPQ.jl
LibPQ.jl copied to clipboard
Add CopyOut! interface for >5x faster data queries
Hi! I tend to do SELECT
queries that return a large dataset for which the standard method takes a few seconds:
using LibPQ, BenchmarkTools, DataFrames
conn = LibPQ.Connection("dbname=postgres user=postgres")
result = execute(conn, """
CREATE TEMPORARY TABLE libpqjl_test (
no_nulls varchar(10),
yes_nulls varchar(10)
);
""")
result = execute(conn, """
insert into libpqjl_test (no_nulls, yes_nulls)
select
left(md5(i::text), 10),
left(md5(random()::text), 10)
from generate_series(1, 1000000) s(i);
""")
@benchmark execute(conn, "SELECT * FROM libpqjl_test;") |> DataFrame
Time (median): 2.977 s ┊ GC (median): 13.03%
PostgreSQL supports the COPY TO
API and I found it to be faster, so I implemented it! Use ] add https://github.com/cmey/LibPQ.jl#copy-out
to try it out!
(we already support the COPY FROM
API for fast ingestion which I took heavy inspiration from)
using CSV
@benchmark begin
databuf = IOBuffer()
copyout = LibPQ.CopyOut!(databuf, "COPY (SELECT * FROM libpqjl_test) TO STDOUT (FORMAT CSV, HEADER);")
result = execute(conn, copyout)
databuf |> CSV.File |> DataFrame
end
Time (median): 590.818 ms ┊ GC (median): 9.10%
Both methods return equal outputs:
isequal(execute(conn, "SELECT * FROM libpqjl_test;") |> DataFrame,
begin databuf = IOBuffer()
copyout = LibPQ.CopyOut!(databuf, "COPY (SELECT * FROM libpqjl_test) TO STDOUT (FORMAT CSV, HEADER);")
result = execute(conn, copyout)
databuf |> CSV.File |> DataFrame
end)
true
The new API expects the user to pass a "string builder" type of construct in the form of a IOBuffer
that, during execution, gets filled with the text of the CSV returned by the database.
I found this didn't fit well with the table-style of Result
, since this is just a text blob.
Note that CSV API changed from CSV version 0.8 to 0.9, and that Julia <1.3.0-1 loads the old CSV, that's why there is the VERSION check in the test code, to support the test run on Julia 1.0.
I hope others will find this helpful! I certainly found this helped me, as I see >10x speed improvement in real-life use cases.
@iamed2 Hi! I'm not sure how to fix the failing formatting CI step, it flags about all source files in the repo, including many I haven't touched :open_mouth: