clickhouse-java
clickhouse-java copied to clipboard
Pass ClickHouse Skip Index parameters while creating Table in ClickHouse from Spark
What I am doing?
I am currently writing a dataframe into clickhouse from PySpark using ClickHouse-jdbc driver. The table is not originally present and it is dynamically created while pushing the dataframe to clickhouse.
Issue
While writing dataframe into ClickHouse, I want to pass the below snippet
INDEX modifiedon_9854 modifiedon TYPE minmax GRANULARITY 4
The below code snippet is not working
df.write \
.format("jdbc") \
.mode("overwrite") \
.option("driver", "com.github.housepower.jdbc.ClickHouseDriver") \
.option("url", self.parameters["connection_properties"]["url"]) \
.option("createTableOptions", "index modifiedon_9854 modifiedon type minmax granularity 4"
"engine=MergeTree()"
"primary key (prospectid)"
" order by (prospectid,toYYYYMM(createdon), toYYYYMM(modifiedon))"
" partition by (prospectstage)") \
.option("user", "...") \
.option("password", "...") \
.option("dbtable", "...") \
.option("batchsize", "20000") \
.option("truncate", "true") \
.save()
Error Thrown
Caused by: com.github.housepower.exception.ClickHouseSQLException: DB::ExceptionDB::Exception: Syntax error: failed at position 237 ('index'): index modifiedon_9854 modifiedon type minmax granularity 4engine=MergeTree()primary key (prospectid) order by (prospectid,toYYYYMM(createdon), toYYYYMM(modified. Expected one of: storage definition, ENGINE, AS. Stack trace:
...
Working Code Snippet till now
df.write \
.format("jdbc") \
.mode("overwrite") \
.option("driver", "com.github.housepower.jdbc.ClickHouseDriver") \
.option("url", self.parameters["connection_properties"]["url"]) \
.option("createTableOptions", "engine=MergeTree()"
"primary key (prospectid)"
" order by (prospectid,toYYYYMM(createdon), toYYYYMM(modifiedon))"
" partition by (prospectstage)") \
.option("user", "...") \
.option("password", "...") \
.option("dbtable", "...") \
.option("batchsize", "20000") \
.option("truncate", "true") \
.save()
Does anyone know how can we achieve this from Spark ?
A workaround I did is using clickhouse-driver Python Library to create table then and there and using that table to load the data into it from Spark