osm2pgsql icon indicating copy to clipboard operation
osm2pgsql copied to clipboard

Using binary format for COPY

Open joto opened this issue 9 months ago • 1 comments

As part of #2110 we found that using the binary format for COPY promises some performance benefits. Lets think about what changes we need for that.

The binary format seems simple enough to implement and it appears to be pretty stable. But there is no documentation for all the types, instead the PostgreSQL docs say you should look into the code. There are probably also some client implementations out there we could find that implement this.

Text fields are just length + content and ints are ints in network byte order. But I don't know about more complex types such as arrays and json. This could be especially problematic for tables that the user creates, because they can use any type they want, even one where we might not know the internal representation (see also #2274). But maybe we can still use the text representation and PostgreSQL converts on import as it does with the COPY text format? If not we probably can't use the binary format for user tables, but we could still use it for middle tables.

joto avatar Jan 06 '25 08:01 joto