blazer icon indicating copy to clipboard operation
blazer copied to clipboard

Added support for ClickHouse

Open foxy-eyed opened this issue 3 years ago • 15 comments

ClickHouse adapter implementation (resolves https://github.com/ankane/blazer/issues/378).

Requires click_house Ruby driver to interact with clickhouse-server via HTTP interface.

Adapter methods:

  • run_statement
  • tables
  • schema
  • preview_statement
  • explain

Readme instructions also added.

foxy-eyed avatar Jan 17 '22 07:01 foxy-eyed

Hey @foxy-eyed, thanks for the PR! Overall, looks great.

Have you tried clickhouse-activerecord? It looks like they both have ~60k downloads, and I think we could use the SQL adapter if it works.

ankane avatar Jan 20 '22 03:01 ankane

Hi @ankane, thank you for feedback!

I saw clickhouse_activerecord but I prefer not to choose it)

My understanding is, that ClickHouse is made for (and good for) processing big analytical data, like user activity, events etc into reports. In real-time. ActiveRecord feels like unnecessary abstraction that does not fit scenarios because:

  • singe events are not as interesting (with all its numerous attributes) as chains of events and aggregated data; we read large number of rows, but only a small subset of columns;
  • in most cases we need some kind of aggregated report => we have to build custom SQL including subexpressions, functions, specific response format etc => dsl does not fit;
  • #create method mostly useless because data is inserted in large batches (otherwise it's inefficient);
  • we don't have relations;
  • usually data is not mutable after is is added to the DB.

Also support of different response format (https://clickhouse.com/docs/en/interfaces/formats/) would be handy, but clickhouse_activerecord processes only JSON. In other cases it returns response body as a string (btw, clickhouse_activerecord also uses HTTP interface under the hood but with Net::Http instead of Faraday).

But if someone already uses clickhouse_activerecord we can take it, right?)

I added detection of driver dependency and created classes for both drivers with common interface, blazer adapter now can work with both. Please take a look, what do you think? Should we keep both?

foxy-eyed avatar Jan 20 '22 20:01 foxy-eyed

The main benefit of clickhouse-activerecord is we wouldn't need to maintain a separate adapter (assuming it works with SqlAdapter).

ankane avatar Jan 22 '22 21:01 ankane

It looks like queries work with SqlAdapter.

data_sources:
  my_source:
    url: clickhouse://playground:[email protected]:8443/datasets?ssl=true

However, information_schema isn't available until 21.11.2.2-stable (https://github.com/ClickHouse/ClickHouse/pull/28691), so the queries for tables and the schema would need to be updated to support earlier versions.

Also, it looks like neither library typecasts date and time columns, which is needed for charting, and FixedString(N) types aren't showing up properly (SELECT * FROM visits_v1 LIMIT 10 in data source above).

With that additional info, I'm leaning towards clickhouse-activerecord still being a better fit for Blazer.

ankane avatar Jan 22 '22 23:01 ankane

and FixedString(N) types aren't showing up properly (SELECT * FROM visits_v1 LIMIT 10 in data source above).

It looks like dataset issue, not ruby drivers. I checked the same query with curl, with Tabix and with RubyMine database tool (with official clickhouse jdbc driver) — same result.

$ curl "https://play-api.clickhouse.com:8443/?query=SELECT+ClientIP6+FROM+datasets.visits_v1+LIMIT+3;&user=playground&password=clickhouse&database=datasets"
{=�������bn�|I
{=�������bn�|I
{=�������bn�|I

Also, it looks like neither library typecasts date and time columns, which is needed for charting

My bad, forgot to check charting. I found out that click_house lib correctly typecasts query result in JSON format if one of Clickhouse::Extend::ConnectionSelective methods was called. You can check the demo with fix: https://clickhouse-blazer-demo.herokuapp.com/blazer/queries/1 https://clickhouse-blazer-demo.herokuapp.com/blazer/queries/2

Unfortunately, I couldn't make it work with clickhouse_activerecord.

Now it looks like click_house is better option.

foxy-eyed avatar Jan 23 '22 20:01 foxy-eyed

Now I am little confused) I'll try to summarize:

  1. clickhouse_activerecord + sql-adapter: + no need to maintain a separate adapter - growing complexity of sql-adapter (need to add some 'if's to support clickhouse < 21.11.2.2-stable) - issues with date/time columns (I guess we could handle it on our own, but => even more complexity) - force to install clickhouse_activerecord for Blazer even if click_house already used in project

  2. click_house + separate clickhouse adapter + keep the code separate and simple + date/time columns + charts based on them works - one more class to maintain - force to install click_house for Blazer even if clickhouse_activerecord already used in project

  3. Support both libs + separate clickhouse adapter + no need to install extra gem for Blazer (there's a high probability that one of them already installed) + separate adapter still keeps the code quite simple - one more class to maintain +/- ability of charting depends on the choice of lib.

@ankane what do you think?

foxy-eyed avatar Jan 23 '22 20:01 foxy-eyed

Let me think on this for a bit. I don't really want to support multiple adapters, so it'll likely be 1 or 2.

ankane avatar Jan 25 '22 19:01 ankane

@ankane hi!) Have you come to any conclusion?

foxy-eyed avatar Feb 03 '22 17:02 foxy-eyed

Let's go with option 2 since date/time columns already work. A few other notes:

  1. If DateTime objects are converted to Time in the adapter, they'll appear in Blazer.time_zone in the UI
  2. In the schema method, think we can use WHERE database = currentDatabase()
  3. In the config method, let's only remove the starting / from the path for the database, like this adapter
  4. Would be good to support the ssl_verify option in the YAML config

ankane avatar Feb 06 '22 01:02 ankane

@ankane hi!

If DateTime objects are converted to Time in the adapter, they'll appear in Blazer.time_zone in the UI

I'm not sure I got it right, but as I see none of ClickHouse types (Date, DateTime, DateTime64) are converted by click_house driver to ruby Time objects:

  • CH Date parses as Ruby Date
  • CH DateTime, DateTime64 — as Ruby DateTime

DateTime#to_s produces strings like 2022-02-06T01:10:11+00:00 (with original tz offset) in the UI.

All other comments — done ✅ Hope it is ok now)

foxy-eyed avatar Feb 06 '22 18:02 foxy-eyed

Looks great! For times, we'll want Blazer to display them in Blazer.time_zone to be consistent with other adapters (rather than the original tz offset), but to do that, they'll need to be converted to Time objects (value.to_time) when building rows.

ankane avatar Feb 07 '22 07:02 ankane

Hi @ankane, sorry for bothering you. Do you have any other concerns about this PR? Can it be merged?

UPD: after I wrote this I found a bug, sorry. Fixed. Now it looks like everything is ok. Updated demo.

foxy-eyed avatar Feb 19 '22 17:02 foxy-eyed

Hi @ankane, this adapter would be extremely useful as clickhouse is a very popular column store often utilized as analytics data store. Are there any blockers for this pr? Thanks.

okdas avatar May 18 '22 05:05 okdas

I also have a PR open and e-mailed @ankane about this a while back. It seems like for now the priorities are not with this gem, which is perfectly fine of course. So I'd recommend forking Blazer if you want to keep updating it.

The downside is missing out on various PRs, so maybe it is an idea to create a maintained fork and to start pulling in PRs from here if @ankane is no longer able or interested in developing this gem. If there isn't such a fork already of course.

abuisman avatar Sep 12 '23 11:09 abuisman

@abuisman I've forked the project and added Clickhouse support from this branch (including a small bugfix which handles the situation when schema doesn't have any tables).

My plan is to selectively merge some PRs from blazer and rewrite a couple components to make it easier extending the project (for example, I'm using bootstrap 5 and stimulus for interactions). It's still a work in progress, but feel free to share your thoughts, here is the fork

khasinski avatar Oct 28 '23 20:10 khasinski