COPY FROM STDIN support for Postgres compatibility
Is your feature request related to a problem? Please describe.
I need to import a 43M sized CSV file that has a timestamp column that follows the pattern yyyy-MM-ddTHH:mm:ss.SSS for most of its rows, but where some rows follow the pattern yyyy-MM-ddTHH:mm:ss (notice the missing milliseconds field). QuestDBs REST API does not handle these special cases (when supplied with pattern: yyyy-MM-ddTHH:mm:ss.SSS), and instead bails on the values that don't follow the prescribed a pattern.
To fix the problem, I am reading the CSV to construct a proper timestamp. The logic looks something like this (using the rust libraries chrono, csv, init_array, and tokio_postgres):
let statement = client.prepare(INSERT_INTO_TRADES).await?;
let mut trades_reader = csv::Reader::from_path(path)?;
let trades_records = trades_reader.deserialize();
for (i, result) in trades_records.enumerate() {
let record: TradeRecord = result.map_err(|e| {
eprintln!("Failed reading record {:?}", i);
e
})?;
// Parse the following format: yyyy-MM-ddTHH:mm:ss.SSS
// This also works for entries that lack the `.SSS` part.
let timestamp = match NaiveDateTime::parse_from_str(
&record.timestamp,
"%Y-%m-%dT%H:%M:%S%.f",
) ?;
// Create a stack allocated array over ToSql trait objects. We can't use the Chain
// iterator directly, because `Chain` does implement `ExactSizeIterator` that
// `Client::execute_raw` requires.
let fields: [&dyn ToSql; 20] = init_array::init_array(|i|
std::iter::once(×tamp as &dyn ToSql)
.chain(record.iter().map(|v| v as &dyn ToSql))
.skip(i)
.next()
.unwrap()
);
client.execute_raw(
&statement,
std::array::IntoIter::new(fields),
).await?;
}
Problem: this is very slow.
Describe the solution you'd like
I would love for QuestDB to support the Postgres COPY FROM STDIN API, so that I can use binary format COPY queries, e.g. via the postgres_binary_copy library.
Describe alternatives you've considered
- Preparing a new CSV with the fixed timestamp, which is then fed to the REST API: I'd like to avoid another filesystem IO.
- Using a transaction instead of calling
execute_rawdirectly, which is then commited: this is noticably faster, but still slow. - Using the InfluxDB wire protocol: unfortunately, Rust support for InfluxDB is limited and not on the level of Postgres.
It could be really cool feature. I use FORMAT binary for an import and for queries/export (on postgresql).
In a case of the import I encode buffer data the way they look native to postgres so it's much more faster then sending data in a text form and parsing it (in my case about 10x faster).
This is import part:
COPY ts.point(serie_id, datetime, value, version_ts, scope_type, scope_id) FROM STDIN (FORMAT binary);
This is a query/export part (somehow also much more faster than regular query/select):
COPY (
SELECT datetime,
value,
version_ts,
scope_type,
scope_id
FROM ts.POINT
WHERE
-- condition
) TO STDOUT (FORMAT binary);