ibis
ibis copied to clipboard
feat(clickhouse): use http API
Codecov Report
Merging #4455 (e429970) into master (c59e94e) will decrease coverage by
2.30%. The diff coverage is19.80%.
@@ Coverage Diff @@
## master #4455 +/- ##
==========================================
- Coverage 92.68% 90.38% -2.31%
==========================================
Files 181 181
Lines 20707 20837 +130
Branches 2963 2974 +11
==========================================
- Hits 19192 18833 -359
- Misses 1142 1630 +488
- Partials 373 374 +1
| Impacted Files | Coverage Δ | |
|---|---|---|
| ibis/backends/clickhouse/__init__.py | 35.82% <17.20%> (-54.61%) |
:arrow_down: |
| ibis/backends/clickhouse/client.py | 56.66% <20.00%> (-29.45%) |
:arrow_down: |
| ibis/backends/clickhouse/datatypes.py | 20.80% <100.00%> (-79.21%) |
:arrow_down: |
| ibis/common/grounds.py | 96.73% <100.00%> (ø) |
|
| ibis/backends/clickhouse/registry.py | 26.85% <0.00%> (-63.98%) |
:arrow_down: |
| ibis/backends/clickhouse/identifiers.py | 60.00% <0.00%> (-40.00%) |
:arrow_down: |
| ibis/backends/clickhouse/compiler.py | 67.85% <0.00%> (-19.65%) |
:arrow_down: |
| ... and 15 more |
Test Results
34 files 34 suites 1h 14m 40s :stopwatch: 9 575 tests 6 788 :heavy_check_mark: 1 885 :zzz: 433 :x: 469 :fire: 34 365 runs 25 559 :heavy_check_mark: 7 002 :zzz: 866 :x: 938 :fire:
For more details on these failures and errors, see this check.
Results for commit e4299702.
:recycle: This comment has been updated with latest results.
Ok, I've just rebased this PR and want to recap why things are currently implemented the way they are:
We have to use JSONEachRow as the output format
Unfortunately we can't use Arrow/Parquet/ORC or anything else that ClickHouse outputs for IPC that pyarrow can handle. ClickHouse doesn't support converting its own Enum type to Arrow or Parquet.
The clickhouse playground is chock full of enum columns (as an example of an incredibly useful collection of tables), so "just not supporting them" isn't really a viable option.
Even if we could use ArrowStream/Arrow/Parquet we can't really
The main blocker here is that the setting to turn ClickHouse strings into Arrow/Parquet strings just doesn't work at all if the server is running in read-only mode. This makes it really hard to justify using the HTTP interface despite it being better in almost every other way that matters to us than clickhouse-driver, because we now have to handle strings coming back as binary, which is going to introduce a bunch of whack-a-mole style bugs or UX problems that we currently do not have.
This did prompt me to make HTTP 500s actually report the server error, so that's a win I guess.
There's still the pesky problem of columns with strings in their names not working
See this issue for more detail: https://github.com/ClickHouse/ClickHouse/issues/41112.