Added support for ClickHouse
ClickHouse adapter implementation (resolves https://github.com/ankane/blazer/issues/378).
Requires click_house Ruby driver to interact with clickhouse-server via HTTP interface.
Adapter methods:
- run_statement
- tables
- schema
- preview_statement
- explain
Readme instructions also added.
Hey @foxy-eyed, thanks for the PR! Overall, looks great.
Have you tried clickhouse-activerecord? It looks like they both have ~60k downloads, and I think we could use the SQL adapter if it works.
Hi @ankane, thank you for feedback!
I saw clickhouse_activerecord but I prefer not to choose it)
My understanding is, that ClickHouse is made for (and good for) processing big analytical data, like user activity, events etc into reports. In real-time. ActiveRecord feels like unnecessary abstraction that does not fit scenarios because:
- singe events are not as interesting (with all its numerous attributes) as chains of events and aggregated data; we read large number of rows, but only a small subset of columns;
- in most cases we need some kind of aggregated report => we have to build custom SQL including subexpressions, functions, specific response format etc => dsl does not fit;
-
#createmethod mostly useless because data is inserted in large batches (otherwise it's inefficient); - we don't have relations;
- usually data is not mutable after is is added to the DB.
Also support of different response format (https://clickhouse.com/docs/en/interfaces/formats/) would be handy, but clickhouse_activerecord processes only JSON. In other cases it returns response body as a string (btw, clickhouse_activerecord also uses HTTP interface under the hood but with Net::Http instead of Faraday).
But if someone already uses clickhouse_activerecord we can take it, right?)
I added detection of driver dependency and created classes for both drivers with common interface, blazer adapter now can work with both. Please take a look, what do you think? Should we keep both?
The main benefit of clickhouse-activerecord is we wouldn't need to maintain a separate adapter (assuming it works with SqlAdapter).
It looks like queries work with SqlAdapter.
data_sources:
my_source:
url: clickhouse://playground:[email protected]:8443/datasets?ssl=true
However, information_schema isn't available until 21.11.2.2-stable (https://github.com/ClickHouse/ClickHouse/pull/28691), so the queries for tables and the schema would need to be updated to support earlier versions.
Also, it looks like neither library typecasts date and time columns, which is needed for charting, and FixedString(N) types aren't showing up properly (SELECT * FROM visits_v1 LIMIT 10 in data source above).
With that additional info, I'm leaning towards clickhouse-activerecord still being a better fit for Blazer.
and FixedString(N) types aren't showing up properly (SELECT * FROM visits_v1 LIMIT 10 in data source above).
It looks like dataset issue, not ruby drivers. I checked the same query with curl, with Tabix and with RubyMine database tool (with official clickhouse jdbc driver) — same result.
$ curl "https://play-api.clickhouse.com:8443/?query=SELECT+ClientIP6+FROM+datasets.visits_v1+LIMIT+3;&user=playground&password=clickhouse&database=datasets"
{=�������bn�|I
{=�������bn�|I
{=�������bn�|I
Also, it looks like neither library typecasts date and time columns, which is needed for charting
My bad, forgot to check charting.
I found out that click_house lib correctly typecasts query result in JSON format if one of Clickhouse::Extend::ConnectionSelective methods was called.
You can check the demo with fix:
https://clickhouse-blazer-demo.herokuapp.com/blazer/queries/1
https://clickhouse-blazer-demo.herokuapp.com/blazer/queries/2
Unfortunately, I couldn't make it work with clickhouse_activerecord.
Now it looks like click_house is better option.
Now I am little confused) I'll try to summarize:
-
clickhouse_activerecord+ sql-adapter: + no need to maintain a separate adapter - growing complexity of sql-adapter (need to add some 'if's to support clickhouse < 21.11.2.2-stable) - issues with date/time columns (I guess we could handle it on our own, but => even more complexity) - force to installclickhouse_activerecordfor Blazer even ifclick_housealready used in project -
click_house+ separate clickhouse adapter + keep the code separate and simple + date/time columns + charts based on them works - one more class to maintain - force to installclick_housefor Blazer even ifclickhouse_activerecordalready used in project -
Support both libs + separate clickhouse adapter + no need to install extra gem for Blazer (there's a high probability that one of them already installed) + separate adapter still keeps the code quite simple - one more class to maintain +/- ability of charting depends on the choice of lib.
@ankane what do you think?
Let me think on this for a bit. I don't really want to support multiple adapters, so it'll likely be 1 or 2.
@ankane hi!) Have you come to any conclusion?
Let's go with option 2 since date/time columns already work. A few other notes:
- If
DateTimeobjects are converted toTimein the adapter, they'll appear inBlazer.time_zonein the UI - In the
schemamethod, think we can useWHERE database = currentDatabase() - In the
configmethod, let's only remove the starting/from the path for the database, like this adapter - Would be good to support the
ssl_verifyoption in the YAML config
@ankane hi!
If DateTime objects are converted to Time in the adapter, they'll appear in Blazer.time_zone in the UI
I'm not sure I got it right, but as I see none of ClickHouse types (Date, DateTime, DateTime64) are converted by click_house driver to ruby Time objects:
- CH
Dateparses as RubyDate - CH
DateTime,DateTime64— as RubyDateTime
DateTime#to_s produces strings like 2022-02-06T01:10:11+00:00 (with original tz offset) in the UI.
All other comments — done ✅ Hope it is ok now)
Looks great! For times, we'll want Blazer to display them in Blazer.time_zone to be consistent with other adapters (rather than the original tz offset), but to do that, they'll need to be converted to Time objects (value.to_time) when building rows.
Hi @ankane, sorry for bothering you. Do you have any other concerns about this PR? Can it be merged?
UPD: after I wrote this I found a bug, sorry. Fixed. Now it looks like everything is ok. Updated demo.
Hi @ankane, this adapter would be extremely useful as clickhouse is a very popular column store often utilized as analytics data store. Are there any blockers for this pr? Thanks.
I also have a PR open and e-mailed @ankane about this a while back. It seems like for now the priorities are not with this gem, which is perfectly fine of course. So I'd recommend forking Blazer if you want to keep updating it.
The downside is missing out on various PRs, so maybe it is an idea to create a maintained fork and to start pulling in PRs from here if @ankane is no longer able or interested in developing this gem. If there isn't such a fork already of course.
@abuisman I've forked the project and added Clickhouse support from this branch (including a small bugfix which handles the situation when schema doesn't have any tables).
My plan is to selectively merge some PRs from blazer and rewrite a couple components to make it easier extending the project (for example, I'm using bootstrap 5 and stimulus for interactions). It's still a work in progress, but feel free to share your thoughts, here is the fork