ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat: add ADBC backend

Open lidavidm opened this issue 3 years ago • 8 comments

This adds the skeleton of a very minimal ADBC backend. Most tests do not yet pass. The main thing is that ADBC is really a database API abstraction akin to PEP 249, JDBC, or ODBC, so this backend eventually needs to be able to let the user specify the SQL dialect to compile to (or: to use Substrait instead).

Also, I haven't yet updated the dependencies, CI, etc. This is mostly to prove that it (generally) is possible.

ADBC doesn't yet have packages/releases, so getting this up and working takes some work. Basically:

  1. Add cython to shell.nix
  2. Get the ADBC C SQLite driver built (I did this outside of Nix): https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md#cc
  3. Inside the Nix dev environment, get the Python package built: https://github.com/apache/arrow-adbc/blob/main/CONTRIBUTING.md#python (this only worked if I made a virtualenv inside the Nix shell)
  4. export LD_LIBRARY_PATH=path/to/folder/with/libadbc_driver_sqlite.so
  5. export PYTHONPATH=$PYTHONPATH:path/to/folder/with/adbc_driver_manager/package
  6. python -m pytest -m adbc

lidavidm avatar Jul 22 '22 17:07 lidavidm

Test Results

       35 files         35 suites   1h 9m 19s :stopwatch:   8 873 tests   7 094 :heavy_check_mark: 1 779 :zzz: 0 :x: 32 504 runs  25 728 :heavy_check_mark: 6 776 :zzz: 0 :x:

Results for commit 5d301ee8.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Jul 22 '22 17:07 github-actions[bot]

Codecov Report

Merging #4267 (5d301ee) into master (8d0cf80) will decrease coverage by 0.58%. The diff coverage is 4.44%.

:exclamation: Current head 5d301ee differs from pull request most recent head 02d8a7e. Consider uploading reports for the commit 02d8a7e to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4267      +/-   ##
==========================================
- Coverage   92.60%   92.02%   -0.59%     
==========================================
  Files         179      180       +1     
  Lines       20240    20372     +132     
  Branches     2893     2914      +21     
==========================================
+ Hits        18743    18747       +4     
- Misses       1132     1260     +128     
  Partials      365      365              
Impacted Files Coverage Δ
ibis/backends/adbc/__init__.py 0.00% <0.00%> (ø)
ibis/backends/pyarrow/datatypes.py 77.77% <ø> (ø)
ibis/backends/conftest.py 89.59% <62.50%> (-0.91%) :arrow_down:
ibis/backends/base/sql/alchemy/query_builder.py 92.46% <100.00%> (ø)
ibis/backends/impala/__init__.py 86.14% <0.00%> (+0.21%) :arrow_up:

codecov[bot] avatar Jul 22 '22 17:07 codecov[bot]

Hello @lidavidm! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:

Comment last updated at 2022-08-08 18:17:29 UTC

pep8speaks avatar Aug 02 '22 21:08 pep8speaks

On Phillip's suggestion, now the ADBC package implements DBAPI, and this backend just defers to SQLAlchemy (except for fetching results, which gets overridden). I also added a bit of pytest magic here (thanks to Phillip again for explaining how it works).

Current status:

91 failed, 260 passed, 2 skipped, 8681 deselected, 192 xfailed in 50.52s

Most of the remaining failures are because of arithmetic UDFs, which need manual skips. The driver has no way of registering Python UDFs; perhaps the driver could optionally register all the ones we need here, though. Part of the pytest magic is to auto-skip anything already skipped for SQLite.

Next up, I'd like to try reintegrating the Substrait backend

lidavidm avatar Aug 08 '22 18:08 lidavidm

What I ended up doing was futzing around so that the driver can also be distributed as a package: https://github.com/apache/arrow-adbc/pull/57

Once that gets updated, then instead of having to build the driver and such here, we can just depend on the (prerelease) packages.

lidavidm avatar Aug 08 '22 22:08 lidavidm

ADBC should be able to be used by various backends so it's not a backend on its own.

I think ibis will need another abstraction layer which is responsible for the interfacing between various databases.

kszucs avatar Aug 09 '22 16:08 kszucs

I guess then we need to decide what specific backend(s) to target here? (Since this won't be as featureful as the existing SQLite driver, without being able to register UDFs.) Possibly Postgres and Acero?

lidavidm avatar Aug 09 '22 16:08 lidavidm

In that case though, instead of trying to add all the pytest marks to get everything to pass, I'll focus on looking more into Substrait integration

lidavidm avatar Aug 09 '22 16:08 lidavidm

@lidavidm Should we keep this PR open?

cpcloud avatar Oct 06 '22 20:10 cpcloud

Sorry, I've been letting this sit. I'll close it for now, and I'll follow up with a new PR once I get (nightly) wheel releases for the ADBC packages set up; that'll make development easier.

lidavidm avatar Oct 06 '22 20:10 lidavidm