ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(clickhouse): use http API

Open cpcloud opened this issue 3 years ago • 2 comments

cpcloud avatar Sep 02 '22 19:09 cpcloud

Codecov Report

Merging #4455 (e429970) into master (c59e94e) will decrease coverage by 2.30%. The diff coverage is 19.80%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4455      +/-   ##
==========================================
- Coverage   92.68%   90.38%   -2.31%     
==========================================
  Files         181      181              
  Lines       20707    20837     +130     
  Branches     2963     2974      +11     
==========================================
- Hits        19192    18833     -359     
- Misses       1142     1630     +488     
- Partials      373      374       +1     
Impacted Files Coverage Δ
ibis/backends/clickhouse/__init__.py 35.82% <17.20%> (-54.61%) :arrow_down:
ibis/backends/clickhouse/client.py 56.66% <20.00%> (-29.45%) :arrow_down:
ibis/backends/clickhouse/datatypes.py 20.80% <100.00%> (-79.21%) :arrow_down:
ibis/common/grounds.py 96.73% <100.00%> (ø)
ibis/backends/clickhouse/registry.py 26.85% <0.00%> (-63.98%) :arrow_down:
ibis/backends/clickhouse/identifiers.py 60.00% <0.00%> (-40.00%) :arrow_down:
ibis/backends/clickhouse/compiler.py 67.85% <0.00%> (-19.65%) :arrow_down:
... and 15 more

codecov[bot] avatar Sep 02 '22 19:09 codecov[bot]

Test Results

       34 files         34 suites   1h 14m 40s :stopwatch:   9 575 tests   6 788 :heavy_check_mark: 1 885 :zzz: 433 :x: 469 :fire: 34 365 runs  25 559 :heavy_check_mark: 7 002 :zzz: 866 :x: 938 :fire:

For more details on these failures and errors, see this check.

Results for commit e4299702.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Sep 02 '22 20:09 github-actions[bot]

Ok, I've just rebased this PR and want to recap why things are currently implemented the way they are:

We have to use JSONEachRow as the output format

Unfortunately we can't use Arrow/Parquet/ORC or anything else that ClickHouse outputs for IPC that pyarrow can handle. ClickHouse doesn't support converting its own Enum type to Arrow or Parquet.

The clickhouse playground is chock full of enum columns (as an example of an incredibly useful collection of tables), so "just not supporting them" isn't really a viable option.

Even if we could use ArrowStream/Arrow/Parquet we can't really

The main blocker here is that the setting to turn ClickHouse strings into Arrow/Parquet strings just doesn't work at all if the server is running in read-only mode. This makes it really hard to justify using the HTTP interface despite it being better in almost every other way that matters to us than clickhouse-driver, because we now have to handle strings coming back as binary, which is going to introduce a bunch of whack-a-mole style bugs or UX problems that we currently do not have.

This did prompt me to make HTTP 500s actually report the server error, so that's a win I guess.

There's still the pesky problem of columns with strings in their names not working

See this issue for more detail: https://github.com/ClickHouse/ClickHouse/issues/41112.

cpcloud avatar Nov 18 '22 18:11 cpcloud