airflow-clickhouse-plugin
airflow-clickhouse-plugin copied to clipboard
ClickHouseOperator multi-statements support
Hi, I've just realised that I cannot run sql files with multi-statements using ClickHouseOperator
DB::Exception: Syntax error (Multi-statements are not allowed)
For instance,
drop table if exists table_name; create table table_name as ...
Are there any plans or thoughts about solving this issue? PostgresOperator does support this functionality
No plans for it yet. You may propose a PR for this. But keep in mind that SQL queries may be parameterized, so an important concern is to design an intuitive way to propagate parameters: should parameters be propagated to all the queries in the same way? What if a query does not have a parameter?
I believe a common parameters dict for all queries should work intuitively but a quick research on approaches in other libraries is appreciated. How does PostgresOperator do it?
@sergylog, we have a similar case and we use clickhouse-client and bash operator because clickhouse-driver required splitting scenario on single queries
Hi guys, ClickHouseOperator (and ClickHouseSensor, and ClickHouseHook) supports multiple statements in sql parameter. They are executed one by one. parameters are applied to all the queries. Result of the last query is returned.
I am closing the issue since the functionality is present.
@bryzgaloff Thank you for the new feature of supporting multiple statements.
I think it will be more interesting if we can execute a single .sql file(or a string) that contains multiple statements inside.
Airflow ClickHouse Plugin is intended to be a wrapper around clickhoude-driver which supports execution of a single query only. You may submit a feature request to clickhouse-driver to support multiple statements.
Some workarounds:
- You may execute them as multiple files:
sqlparameter should accept a list of*.sqlfiles, I believe they all will be templated, though requires a check. If not, your feature request is limited by the underlying tools: Airflow (if can have a singlesqlfile) and clickhouse-driver (cannot execution multiple statement at once). So, you will have to implement a workaround yourself: it is not the plugin's responsibility. - Also, you may use Airflow mapped operators: split the input file with SQL statements by
;and execute them individually usingClickHouseOperator.expand(…)functionality (inherited from base Airflow operator). Keep in mind that a naive splitting by;might work incorrectly in case of string literals, e.g.WHERE someField like '%;%'— this should not be split or it will result in invalid queries. - Read the input SQL file with multiple statements, split by
;and produce individual operators from that list. Dynamic mapping is preferred as it avoids re-reading the file on each DAG processing.