airflow-clickhouse-plugin icon indicating copy to clipboard operation
airflow-clickhouse-plugin copied to clipboard

ClickHouseOperator multi-statements support

Open sergylog opened this issue 3 years ago • 2 comments

Hi, I've just realised that I cannot run sql files with multi-statements using ClickHouseOperator DB::Exception: Syntax error (Multi-statements are not allowed)

For instance, drop table if exists table_name; create table table_name as ...

Are there any plans or thoughts about solving this issue? PostgresOperator does support this functionality

sergylog avatar May 25 '22 10:05 sergylog

No plans for it yet. You may propose a PR for this. But keep in mind that SQL queries may be parameterized, so an important concern is to design an intuitive way to propagate parameters: should parameters be propagated to all the queries in the same way? What if a query does not have a parameter?

I believe a common parameters dict for all queries should work intuitively but a quick research on approaches in other libraries is appreciated. How does PostgresOperator do it?

bryzgaloff avatar May 25 '22 10:05 bryzgaloff

@sergylog, we have a similar case and we use clickhouse-client and bash operator because clickhouse-driver required splitting scenario on single queries

was-av avatar Sep 21 '22 10:09 was-av

Hi guys, ClickHouseOperator (and ClickHouseSensor, and ClickHouseHook) supports multiple statements in sql parameter. They are executed one by one. parameters are applied to all the queries. Result of the last query is returned.

I am closing the issue since the functionality is present.

bryzgaloff avatar Sep 03 '23 11:09 bryzgaloff

@bryzgaloff Thank you for the new feature of supporting multiple statements.

I think it will be more interesting if we can execute a single .sql file(or a string) that contains multiple statements inside.

zli06160 avatar Sep 20 '23 14:09 zli06160

Airflow ClickHouse Plugin is intended to be a wrapper around clickhoude-driver which supports execution of a single query only. You may submit a feature request to clickhouse-driver to support multiple statements.

Some workarounds:

  • You may execute them as multiple files: sql parameter should accept a list of *.sql files, I believe they all will be templated, though requires a check. If not, your feature request is limited by the underlying tools: Airflow (if can have a single sql file) and clickhouse-driver (cannot execution multiple statement at once). So, you will have to implement a workaround yourself: it is not the plugin's responsibility.
  • Also, you may use Airflow mapped operators: split the input file with SQL statements by ; and execute them individually using ClickHouseOperator.expand(…) functionality (inherited from base Airflow operator). Keep in mind that a naive splitting by ; might work incorrectly in case of string literals, e.g. WHERE someField like '%;%' — this should not be split or it will result in invalid queries.
  • Read the input SQL file with multiple statements, split by ; and produce individual operators from that list. Dynamic mapping is preferred as it avoids re-reading the file on each DAG processing.

bryzgaloff avatar Sep 20 '23 16:09 bryzgaloff