LibPQ.jl icon indicating copy to clipboard operation
LibPQ.jl copied to clipboard

Parse many strings as InlineStrings

Open oxinabox opened this issue 4 years ago • 3 comments
trafficstars

We should match the logic that CSV.jl and Arrow.jl use to decide if they are making Strings or InlineStrings. Once we have https://github.com/JuliaData/InlineStrings.jl/issues/8 it should be as fast, or faster (since no allocations) to do this.

Looking at parse.jl I don't think the change would be too hard.

It would want benchmarking; but I have high hopes.

(Related https://github.com/invenia/LibPQ.jl/issues/207)

oxinabox avatar Oct 05 '21 17:10 oxinabox

Should this just replace #207? Were you thinking of also applying this to variable-length string columns?

iamed2 avatar Oct 05 '21 21:10 iamed2

Yes this one is about variables length string columns if it is good for Arrow/CSV it is good enough for us IMO. It's a much more practical concern since those varchar is actually commonly used

OTOH #207 is only about fixed size strings, which has a even stronger argument for it, since we know the size in advance.

The same PR might close both at once though

oxinabox avatar Oct 05 '21 21:10 oxinabox

LibPQ currently does not consider columns as a whole for parsing. CSV does. Without a large refactor, LibPQ will then not have a consistent type for the column, preventing the strings from being inlined (the whole point of InlineStrings). It's easy to add the option to parse the strings like this, or with a specific InlineString type with a specific size, though.

iamed2 avatar Oct 06 '21 21:10 iamed2