COPY FROM STDIN support for Postgres compatibility

Open SuperFluffy opened this issue 4 years ago • 1 comments

Is your feature request related to a problem? Please describe.

I need to import a 43M sized CSV file that has a timestamp column that follows the pattern yyyy-MM-ddTHH:mm:ss.SSS for most of its rows, but where some rows follow the pattern yyyy-MM-ddTHH:mm:ss (notice the missing milliseconds field). QuestDBs REST API does not handle these special cases (when supplied with pattern: yyyy-MM-ddTHH:mm:ss.SSS), and instead bails on the values that don't follow the prescribed a pattern.

To fix the problem, I am reading the CSV to construct a proper timestamp. The logic looks something like this (using the rust libraries chrono, csv, init_array, and tokio_postgres):

    let statement = client.prepare(INSERT_INTO_TRADES).await?;
    let mut trades_reader = csv::Reader::from_path(path)?;
    let trades_records = trades_reader.deserialize();

    for (i, result) in trades_records.enumerate() {
        let record: TradeRecord = result.map_err(|e| {
            eprintln!("Failed reading record {:?}", i);
            e
        })?;

        // Parse the following format: yyyy-MM-ddTHH:mm:ss.SSS
        // This also works for entries that lack the `.SSS` part.
        let timestamp = match NaiveDateTime::parse_from_str(
            &record.timestamp,
            "%Y-%m-%dT%H:%M:%S%.f",
        ) ?;

        // Create a stack allocated array over ToSql trait objects.  We can't use the Chain
        // iterator directly, because `Chain` does implement `ExactSizeIterator` that
        // `Client::execute_raw` requires.
        let fields: [&dyn ToSql; 20] = init_array::init_array(|i|
            std::iter::once(&timestamp as &dyn ToSql)
                .chain(record.iter().map(|v| v as &dyn ToSql))
                .skip(i)
                .next()
                .unwrap()
        );
        client.execute_raw(
            &statement,
            std::array::IntoIter::new(fields),
        ).await?;
    }

Problem: this is very slow.

Describe the solution you'd like

I would love for QuestDB to support the Postgres COPY FROM STDIN API, so that I can use binary format COPY queries, e.g. via the postgres_binary_copy library.

Describe alternatives you've considered

Preparing a new CSV with the fixed timestamp, which is then fed to the REST API: I'd like to avoid another filesystem IO.
Using a transaction instead of calling execute_raw directly, which is then commited: this is noticably faster, but still slow.
Using the InfluxDB wire protocol: unfortunately, Rust support for InfluxDB is limited and not on the level of Postgres.

Jun 07 '21 16:06 SuperFluffy

It could be really cool feature. I use FORMAT binary for an import and for queries/export (on postgresql). In a case of the import I encode buffer data the way they look native to postgres so it's much more faster then sending data in a text form and parsing it (in my case about 10x faster).

This is import part: COPY ts.point(serie_id, datetime, value, version_ts, scope_type, scope_id) FROM STDIN (FORMAT binary);

This is a query/export part (somehow also much more faster than regular query/select):

COPY (
                SELECT  datetime,
                        value,
                        version_ts,
                        scope_type,
                        scope_id
                FROM ts.POINT 
                WHERE 
                    -- condition
            ) TO STDOUT (FORMAT binary);

May 19 '22 12:05 stalek71