blazer
blazer copied to clipboard
Added support for ClickHouse
ClickHouse adapter implementation (resolves https://github.com/ankane/blazer/issues/378).
Requires click_house
Ruby driver to interact with clickhouse-server via HTTP interface.
Adapter methods:
- run_statement
- tables
- schema
- preview_statement
- explain
Readme instructions also added.
Hey @foxy-eyed, thanks for the PR! Overall, looks great.
Have you tried clickhouse-activerecord? It looks like they both have ~60k downloads, and I think we could use the SQL adapter if it works.
Hi @ankane, thank you for feedback!
I saw clickhouse_activerecord
but I prefer not to choose it)
My understanding is, that ClickHouse is made for (and good for) processing big analytical data, like user activity, events etc into reports. In real-time. ActiveRecord feels like unnecessary abstraction that does not fit scenarios because:
- singe events are not as interesting (with all its numerous attributes) as chains of events and aggregated data; we read large number of rows, but only a small subset of columns;
- in most cases we need some kind of aggregated report => we have to build custom SQL including subexpressions, functions, specific response format etc => dsl does not fit;
-
#create
method mostly useless because data is inserted in large batches (otherwise it's inefficient); - we don't have relations;
- usually data is not mutable after is is added to the DB.
Also support of different response format (https://clickhouse.com/docs/en/interfaces/formats/) would be handy, but clickhouse_activerecord
processes only JSON. In other cases it returns response body as a string (btw, clickhouse_activerecord
also uses HTTP interface under the hood but with Net::Http
instead of Faraday
).
But if someone already uses clickhouse_activerecord
we can take it, right?)
I added detection of driver dependency and created classes for both drivers with common interface, blazer adapter now can work with both. Please take a look, what do you think? Should we keep both?
The main benefit of clickhouse-activerecord
is we wouldn't need to maintain a separate adapter (assuming it works with SqlAdapter
).
It looks like queries work with SqlAdapter
.
data_sources:
my_source:
url: clickhouse://playground:[email protected]:8443/datasets?ssl=true
However, information_schema
isn't available until 21.11.2.2-stable (https://github.com/ClickHouse/ClickHouse/pull/28691), so the queries for tables and the schema would need to be updated to support earlier versions.
Also, it looks like neither library typecasts date and time columns, which is needed for charting, and FixedString(N)
types aren't showing up properly (SELECT * FROM visits_v1 LIMIT 10
in data source above).
With that additional info, I'm leaning towards clickhouse-activerecord
still being a better fit for Blazer.
and FixedString(N) types aren't showing up properly (SELECT * FROM visits_v1 LIMIT 10 in data source above).
It looks like dataset issue, not ruby drivers. I checked the same query with curl, with Tabix and with RubyMine database tool (with official clickhouse jdbc driver) — same result.
$ curl "https://play-api.clickhouse.com:8443/?query=SELECT+ClientIP6+FROM+datasets.visits_v1+LIMIT+3;&user=playground&password=clickhouse&database=datasets"
{=�������bn�|I
{=�������bn�|I
{=�������bn�|I
Also, it looks like neither library typecasts date and time columns, which is needed for charting
My bad, forgot to check charting.
I found out that click_house
lib correctly typecasts query result in JSON format if one of Clickhouse::Extend::ConnectionSelective methods was called.
You can check the demo with fix:
https://clickhouse-blazer-demo.herokuapp.com/blazer/queries/1
https://clickhouse-blazer-demo.herokuapp.com/blazer/queries/2
Unfortunately, I couldn't make it work with clickhouse_activerecord
.
Now it looks like click_house
is better option.
Now I am little confused) I'll try to summarize:
-
clickhouse_activerecord
+ sql-adapter: + no need to maintain a separate adapter - growing complexity of sql-adapter (need to add some 'if's to support clickhouse < 21.11.2.2-stable) - issues with date/time columns (I guess we could handle it on our own, but => even more complexity) - force to installclickhouse_activerecord
for Blazer even ifclick_house
already used in project -
click_house
+ separate clickhouse adapter + keep the code separate and simple + date/time columns + charts based on them works - one more class to maintain - force to installclick_house
for Blazer even ifclickhouse_activerecord
already used in project -
Support both libs + separate clickhouse adapter + no need to install extra gem for Blazer (there's a high probability that one of them already installed) + separate adapter still keeps the code quite simple - one more class to maintain +/- ability of charting depends on the choice of lib.
@ankane what do you think?
Let me think on this for a bit. I don't really want to support multiple adapters, so it'll likely be 1 or 2.
@ankane hi!) Have you come to any conclusion?
Let's go with option 2 since date/time columns already work. A few other notes:
- If
DateTime
objects are converted toTime
in the adapter, they'll appear inBlazer.time_zone
in the UI - In the
schema
method, think we can useWHERE database = currentDatabase()
- In the
config
method, let's only remove the starting/
from the path for the database, like this adapter - Would be good to support the
ssl_verify
option in the YAML config
@ankane hi!
If DateTime objects are converted to Time in the adapter, they'll appear in Blazer.time_zone in the UI
I'm not sure I got it right, but as I see none of ClickHouse types (Date
, DateTime
, DateTime64
) are converted by click_house
driver to ruby Time
objects:
- CH
Date
parses as RubyDate
- CH
DateTime
,DateTime64
— as RubyDateTime
DateTime#to_s
produces strings like 2022-02-06T01:10:11+00:00
(with original tz offset) in the UI.
All other comments — done ✅ Hope it is ok now)
Looks great! For times, we'll want Blazer to display them in Blazer.time_zone
to be consistent with other adapters (rather than the original tz offset), but to do that, they'll need to be converted to Time
objects (value.to_time
) when building rows
.
Hi @ankane, sorry for bothering you. Do you have any other concerns about this PR? Can it be merged?
UPD: after I wrote this I found a bug, sorry. Fixed. Now it looks like everything is ok. Updated demo.
Hi @ankane, this adapter would be extremely useful as clickhouse is a very popular column store often utilized as analytics data store. Are there any blockers for this pr? Thanks.
I also have a PR open and e-mailed @ankane about this a while back. It seems like for now the priorities are not with this gem, which is perfectly fine of course. So I'd recommend forking Blazer if you want to keep updating it.
The downside is missing out on various PRs, so maybe it is an idea to create a maintained fork and to start pulling in PRs from here if @ankane is no longer able or interested in developing this gem. If there isn't such a fork already of course.
@abuisman I've forked the project and added Clickhouse support from this branch (including a small bugfix which handles the situation when schema doesn't have any tables).
My plan is to selectively merge some PRs from blazer and rewrite a couple components to make it easier extending the project (for example, I'm using bootstrap 5 and stimulus for interactions). It's still a work in progress, but feel free to share your thoughts, here is the fork