feat: add ADBC backend
This adds the skeleton of a very minimal ADBC backend. Most tests do not yet pass. The main thing is that ADBC is really a database API abstraction akin to PEP 249, JDBC, or ODBC, so this backend eventually needs to be able to let the user specify the SQL dialect to compile to (or: to use Substrait instead).
Also, I haven't yet updated the dependencies, CI, etc. This is mostly to prove that it (generally) is possible.
ADBC doesn't yet have packages/releases, so getting this up and working takes some work. Basically:
- Add cython to shell.nix
- Get the ADBC C SQLite driver built (I did this outside of Nix): https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md#cc
- Inside the Nix dev environment, get the Python package built: https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md#python (this only worked if I made a virtualenv inside the Nix shell)
-
export LD_LIBRARY_PATH=path/to/folder/with/libadbc_driver_sqlite.so -
export PYTHONPATH=$PYTHONPATH:path/to/folder/with/adbc_driver_manager/package -
python -m pytest -m adbc
Test Results
35 files 35 suites 1h 9m 19s :stopwatch: 8 873 tests 7 094 :heavy_check_mark: 1 779 :zzz: 0 :x: 32 504 runs 25 728 :heavy_check_mark: 6 776 :zzz: 0 :x:
Results for commit 5d301ee8.
:recycle: This comment has been updated with latest results.
Codecov Report
Merging #4267 (5d301ee) into master (8d0cf80) will decrease coverage by
0.58%. The diff coverage is4.44%.
:exclamation: Current head 5d301ee differs from pull request most recent head 02d8a7e. Consider uploading reports for the commit 02d8a7e to get more accurate results
@@ Coverage Diff @@
## master #4267 +/- ##
==========================================
- Coverage 92.60% 92.02% -0.59%
==========================================
Files 179 180 +1
Lines 20240 20372 +132
Branches 2893 2914 +21
==========================================
+ Hits 18743 18747 +4
- Misses 1132 1260 +128
Partials 365 365
| Impacted Files | Coverage Δ | |
|---|---|---|
| ibis/backends/adbc/__init__.py | 0.00% <0.00%> (ø) |
|
| ibis/backends/pyarrow/datatypes.py | 77.77% <ø> (ø) |
|
| ibis/backends/conftest.py | 89.59% <62.50%> (-0.91%) |
:arrow_down: |
| ibis/backends/base/sql/alchemy/query_builder.py | 92.46% <100.00%> (ø) |
|
| ibis/backends/impala/__init__.py | 86.14% <0.00%> (+0.21%) |
:arrow_up: |
Hello @lidavidm! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
Comment last updated at 2022-08-08 18:17:29 UTC
On Phillip's suggestion, now the ADBC package implements DBAPI, and this backend just defers to SQLAlchemy (except for fetching results, which gets overridden). I also added a bit of pytest magic here (thanks to Phillip again for explaining how it works).
Current status:
91 failed, 260 passed, 2 skipped, 8681 deselected, 192 xfailed in 50.52s
Most of the remaining failures are because of arithmetic UDFs, which need manual skips. The driver has no way of registering Python UDFs; perhaps the driver could optionally register all the ones we need here, though. Part of the pytest magic is to auto-skip anything already skipped for SQLite.
Next up, I'd like to try reintegrating the Substrait backend
What I ended up doing was futzing around so that the driver can also be distributed as a package: https://github.com/apache/arrow-adbc/pull/57
Once that gets updated, then instead of having to build the driver and such here, we can just depend on the (prerelease) packages.
ADBC should be able to be used by various backends so it's not a backend on its own.
I think ibis will need another abstraction layer which is responsible for the interfacing between various databases.
I guess then we need to decide what specific backend(s) to target here? (Since this won't be as featureful as the existing SQLite driver, without being able to register UDFs.) Possibly Postgres and Acero?
In that case though, instead of trying to add all the pytest marks to get everything to pass, I'll focus on looking more into Substrait integration
@lidavidm Should we keep this PR open?
Sorry, I've been letting this sit. I'll close it for now, and I'll follow up with a new PR once I get (nightly) wheel releases for the ADBC packages set up; that'll make development easier.