pgwire icon indicating copy to clipboard operation
pgwire copied to clipboard

Support for COPY in pgwire

Open osawyerr opened this issue 1 year ago • 8 comments

Hi there.

Is the COPY command supported by pgwire?

I saw it as ticked off in the to-do on the README but I can't find any example or API for it.

osawyerr avatar Jun 18 '23 01:06 osawyerr

I just finished codecs for COPY family but the API layer is still pending. Do you have a use-case for COPY commands? Let me know if you have a design for its API.

sunng87 avatar Jun 18 '23 03:06 sunng87

Ah ok. Yes we have a use case for COPY to replicate postgres' functionality of loading data into tables "fast". Don't you guys use it in https://github.com/GreptimeTeam ? I thought that would be a requirement for anyone wanting to migrate.

osawyerr avatar Jun 18 '23 10:06 osawyerr

At Greptime we haven't supported postgres COPY protocol. We do have copy statement but it's designed to tell GreptimeDB to load from or dump to s3/file system. So the actual data doesn't flow from the wire.

sunng87 avatar Jun 18 '23 13:06 sunng87

Oh I see. I'm not very familiar with the COPY protocol so not sure what the API should look like. As a first stab can the pgwire API present a byte stream from the client, with some details (file type, etc). The backend implementations using pgwire will probably load the file into their DB in some proprietary way like you mentioned above. i.e. they would just receive the bytesteam, possibly save it and load it in some proprietary way to their database, like you do in Greptime. Thoughts?

osawyerr avatar Jun 20 '23 00:06 osawyerr

Been looking at the tokio_postgres::client::Client and it seems they implement something similar to what I describe above:

pub async fn copy_in<T, U>(&self, statement: &T) -> Result<CopyInSink<U>, Error>
pub async fn copy_out<T>(&self, statement: &T) -> Result<CopyOutStream, Error>

osawyerr avatar Jun 20 '23 19:06 osawyerr

I think the stream doesn't need to carry information about file. It's just plain row data and the receiver will need to find out how to store the data. The server API in pgwire should be pretty similar to the client.

How copy works in postgres: https://www.postgresql.org/docs/current/protocol-flow.html#PROTOCOL-COPY

sunng87 avatar Jun 21 '23 03:06 sunng87

After checking docs again, I'm going to design the server APIs for copy as:

/// handler for copy messages
#[async_trait]
pub trait CopyHandler {
    async fn on_copy_in<C, S>(&self, client: &mut C, copy_data_stream: S) -> PgWireResult<()>
    where
        C: ClientInfo,
        S: Stream<Item = CopyData>;

    async fn on_copy_out<C, S>(&self, client: &mut C) -> PgWireResult<S>
    where
        C: ClientInfo,
        S: Stream<Item = CopyData>;

    async fn on_copy_both<C, S1, S2>(
        &self,
        client: &mut C,
        copy_data_stream: S1,
    ) -> PgWireResult<S2>
    where
        C: ClientInfo,
        S1: Stream<Item = CopyData>,
        S2: Stream<Item = CopyData>;
}

sunng87 avatar Jun 24 '23 09:06 sunng87

Should be actual COPY command be included as well, so the server can ascertain the delimiter, format, etc of the file as specified by the uploader? Or will the server be able to easily parse the CopyData?

osawyerr avatar Jun 25 '23 11:06 osawyerr

The copy support is just merged.

sunng87 avatar Jun 13 '24 17:06 sunng87