pgsql icon indicating copy to clipboard operation
pgsql copied to clipboard

Single command to initialize, send and end a COPY?

Open odo opened this issue 10 years ago • 8 comments

Hi,

in my code I do a copy like this (Connection is the PID of a pgsql_connection process):

copy_to(TableName, Data, Connection) ->
    {copy_in, [text]} = pgsql_connection:simple_query(["COPY \"", TableName, "\"(time, key, value) FROM STDIN"], Connection),
    ok                = pgsql_connection:send_copy_data(Data, Connection),
    {copy, Count}     = pgsql_connection:send_copy_end(Connection),
    Count.

Since the Connection is shared by several client processes, there can be some interference here, resulting in errors like "unexpected message type 0x51".

Would it make sense to wrap this three calls into one so they are always executed in sequence? I can implement this, just wanted to check if this makes sense or if I am using it incorrectly.

odo avatar Apr 17 '15 08:04 odo

It seems a little odd to me, although I've only used COPY when I'm streaming data through so this use-case hadn't occurred to me outside of tests.

In my code, I always use pgsql_connection:open/1 to get an exclusive connection for myself (actually, I generally use episcina to pool connections and get it from there, but it's still exclusive while I'm holding it).

I don't think you should share a connection among multiple processes simultaneously.

waisbrot avatar May 07 '15 18:05 waisbrot

It seems my use case is a bit different:

I have a pool of established connections. When picking one I try to avoid the overhead of acquiring a lock as episcina does. Instead I rely on the ability of pgsql_connection being a gen_server to serialize access to the resource/the database.

In this context it seems odd that in order to use copy you have to issue a command (simple_query("COPY...")) which will put the server in a state where its API becomes meaningless unless you continue with the right sequence of commands (send_copy_data/2, send_copy_end/1).

I think sharing a connection could easily made possible.

odo avatar May 08 '15 09:05 odo

You get the same behavior with transactions, and COPY is pretty much a special-case transaction.

If your code did

pgsql_connection:simple_query("BEGIN", Connection),
pgsql_connection:extended_query(Insert, Bindings, Connection),
pgsql_connection:extended_query(Insert2, Bindings2, Connection),
pgsql_connection:simple_query("COMMIT", Connection),

You would find that it was unreliable because another process might come in in the middle and issue an error-generating command which would abort your transaction.

But I think I understand: you're thinking of COPY in terms of how it works in the psql program (which is a single client not sharing its connection, but it does make it look like COPY is a single command). In that context, your suggestion does make sense to me.

What about the delay, though? Does it seem OK to you that sometimes a simple SQL command will block for a long time (because it's stuck in line behind a huge COPY) and other times it will respond instantly?

waisbrot avatar May 08 '15 10:05 waisbrot

@odo ping. Any opinion on my question about delays?

waisbrot avatar Aug 24 '15 15:08 waisbrot

The example with the transaction makes perfect sense. From my perspective this use case would also be better served with a pgsql_connection:transaction/2 command taking the connection and a list of queries as an argument.

Regarding delays I think also a simple query like "SELECT COUNT(*) FROM..." could potentially take considerable longer than a small COPY. In my case all the queries are very similar so round-robin works well.

I you consider a scenario where you have many fast queries and some long running ones it make sense to acquire a lock and have exclusive access to the connection.

Considering this I think the way it works currently makes sense for most and I will just continue with my fork 8).

odo avatar Aug 25 '15 12:08 odo

Unfortunately, transaction probably isn't the best name for such a function as the word has a very specific meaning, unless it encloses calls with a BEGIN/COMMIT pair or ROLLBACK if something fails (and therefore wouldn't work for COPY). Also, after mnesia:transaction/1, it probably should take a function instead of a list of queries to execute.

pguyot avatar Aug 25 '15 12:08 pguyot

I was thinking exactly this, a bunch of queries that should be enclosed in BEGIN...COMMIT. And COPY implicitly opens a transaction anyway as far as I remember.

That way copy and transaction would be single calls to the pgsql_connection process.

odo avatar Aug 25 '15 12:08 odo

COPY does work within a proper transaction, so that could be a good solution. It does not itself implicitly open a transaction. Rather, it puts the current connection into a "copy" state where the only acceptable actions are sending data or ending (one way or another) the copy.

waisbrot avatar Aug 25 '15 13:08 waisbrot