ibis
ibis copied to clipboard
feat(databricks): add the databricks backend
Description of changes
Add support for the databricks backend.
Notes
- The PySpark compiler is almost entirely reused. Naturally there are a couple cases where things differ, and the get overridden in the databricks compiler.
- Databricks seems to be aggressive about turning SQL
NULLs intoNaNs, which defeats a number of array and map tests that expectNonein the output fromto_pandas/execute - Naturally, databricks pins pyarrow to <17 (one version behind the latest) and numpy to <2. It's not as bad as it was with the snowflake connector, but we shouldn't merge this until we can figure out a sane workaround to avoid the pin in CI.
Issues closed
Resolves #9248.
Tests won't run until merge due to cloudiness.
I'll post the results from a local run, and then fix the CI (if needed) once this is merged.
This is passing locally:
…/ibis on databricks is 📦 v9.5.0 via 🐍 v3.12.5 via ❄️ impure (ibis-3.12.5-env)
❯ pytest -m databricks -n auto --dist loadgroup --snapshot-update -q
bringing up nodes...
xx.xxx.xsssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssx...x.sssssssssssssssssssss.........x...................x.........................x [ 9%]
......x.............................x.......................................x.......................x..x.................x...............................................x..............xxx... [ 19%]
xxxxxxxxxxxxxxxxxxx.x.x.xxxxx...............x.........................x..........x.....x...........xx.....x................x..........................xx............................x....x.... [ 29%]
..........x.................x................x..x......x...x.............................xx..x...x...................x.x.........x...............x.......x.....x.x.x......xx.....xx....x...... [ 39%]
.........xx..x..x....xx.......x.x...x........xx.x.......x....................x.......x...................x..x.........x.....x.........x..sx.x....x.xs..x..s.....x..x......sx.................. [ 49%]
..................s..xx.x.....................x........x...................s...........s......................................x.s..................xx........x..x.x........x..........x....x.. [ 59%]
xx.x....x..x......x.......x.xxx.........x...x.x.xx.......................x.......x....xx.....x.....x.x.X.x....x...xX..x..xxx.x.xx......xxxx..xxx.....x...xx.x.................x..x............ [ 69%]
................................................................................................x..x..............xx..............x..xx.x......xx..........x......x....................x...... [ 79%]
.....x.....x..x.x.......................x.......x...x...xx....x.x......x.......x...x....x....x.x..................x.x.x..x......x................x............................x............... [ 89%]
..................s...........x.................s......................................................................................................................................x....x. [ 99%]
..x........ [100%]
1569 passed, 130 skipped, 210 xfailed, 2 xpassed in 273.60s (0:04:33)
We probably need a policy in place for what happens to backends with no local CI option if financial sponsorship for those CI runs goes away.
The only reasonable policy I can think of right now is that if funding dries up, the backend goes into backend purgatory, where the CI stops running and support becomes best effort.
Thoughts?
That seems reasonable to me. And I think, if that happens, we add badges to that effect in the readme and on the backend docs page, effectively severing the backend from any semantic versioning guarantees (while still attempting best efforts to maintain those guarantees)
@gforsyth Where should we put this policy?
@gforsyth Where should we put this policy?
How about a page in the docs in the Backends dropdown alongside the operations support matrix?
@gforsyth Finally got around to adding that policy page.
Databricks backend passes the backend test suite:
❯ pytest -m databricks -n auto --dist loadgroup --snapshot-update -q
bringing up nodes...
x...x.x..x...sssssssssssssssssssss..........xx.x..sssssssssssssss.ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssxx...x............xx......x........x.x.......x..x [ 10%]
x........x...x.....xx.x.x.x.........x................xx...............x.........x..x..s...........s......s...........................................................................x................. [ 20%]
........................................................................................................x....x...................x.x.....x......x.x.............x........x...................x......... [ 30%]
..x....................xx............................x.....x...........................................x..x....................................x....xxxxxxxxxxxx.........xx.x..xxx..xxx.x..x.x......... [ 41%]
................x...........x......x.......x...................x................................x....x..............x.....................x.........x..............xx......x.....xx..............x..... [ 51%]
...x.......x....x........xxxxx.xxx.xxxxx..xxxxx...xx...x..xx....x.......x............x..x.........ss....................xx..x.....x..s.s......x....x........x........x....x...........x.........x...xxx [ 61%]
xx......s.....x....x.x.x.........x...............x.x......x...x..x...x.x...X.xx.....xx........xx.x..........x..xxx........x..xx........x.x..x.....xx........X...x..............x...xx...x...........x.. [ 72%]
.............x..........................x..x....................x....x........x......x.....x.....x......x........................................x.............................xx.x....x............... [ 82%]
.....................x.....x.........x..................x...x.....x...x........x.x.x.....x..........x..........x...x...................x.....x.................................x.................s..s.. [ 92%]
...........................x................................................................................x............x......xx.x.... [100%]
1584 passed, 130 skipped, 211 xfailed, 2 xpassed in 209.01s (0:03:29)
Snowflake and BigQuery both passing:
…/ibis on databricks is 📦 v9.5.0 via 🐍 v3.12.7 via ❄️ impure (ibis-3.12-env)
❯ pytest -m snowflake -n 8 --dist loadgroup --snapshot-update -q && pytest -m bigquery -n auto --dist loadgroup --snapshot-update -q
bringing up nodes...
x...........s.......................s........s............................................x.x.x..xxxxxxxxx.xx.............xx............xx.x.x..........xx.......x...............x.............x..x.x.. [ 9%]
.....x...x..x....xx...........x...........x..............x...........x...x.x..x...x.....x.......xx..xx.xxx..x...x.....x....x.......................s................................................... [ 19%]
...........xxxxxxxxxxx...x.............x..x............x.xxx..xxxx.xx....xxx.x......x....x..xxx.......xx..xx.x.xx.xxx.xx...xxxxxxxxxx.xx....x.x.xx..xxx..x.xx..x.xx...xx.x.....xx.x....x........x...... [ 29%]
..............x.............xx......................x..................x...........x........x..s.xx...x...x..x....x.......xxx..x......x...x............................x.......xx............x......... [ 39%]
.....................x............x....x........x.....x..............................xx....x....x....x....x...x...............xx..x........x........................................................... [ 49%]
.......................xx......................x...............................................x.x..................................x..s..x..x....x...................x...x......................x...x. [ 59%]
......................x.............................x...............x.......................x.....x...x.........................x....x.................x...........x....x......................xx...... [ 68%]
........x.x........x.............x..............x..................x....x...........x.....x.....x........................................................x....x...................x.................... [ 78%]
......................x.....................x...................................x...................x........x.........x.x..xx........x.........x..........x...x....x...x.s.s.x......s................. [ 88%]
s.......................................................................................................................................................................................x.............. [ 98%]
............................... [100%]
1785 passed, 10 skipped, 226 xfailed in 232.21s (0:03:52)
bringing up nodes...
........................s....sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.ssssssssssssssssssssssss....s......sssssssssssssssssssss.................s...x......x..x...... [ 8%]
................................................s....x.....................x...x...........x....xxx..x..x...x..xxx.x.....x.xx......x.........x....................................................x.... [ 17%]
x......................................x.............................................x............................x.x...x...x.....x...........s..s.x...s.x...x..x..sx...............x.......xx......... [ 26%]
.....x..x.x.xx........x............Xxxx.xx........x..x.....x..................x.xxx...x.xxx.x.XxXx..x............x.x....x...x.xx....X.............xx.xx....xx..xxx..xxxx.xx..x.xx.....xx...x......x...x [ 35%]
x...x..x.....x..x.x.x..xx.....x........x.........xxxx......x....x...xxx.............x..x..x...x....x......x....x.x.x.......xx....x..x...x................xx.......x..............x.......xx............ [ 44%]
.....x.....x...x......x......x............x.x..........x...x.x..x.......x..............xx........x......xx..x..........x.............x.........x.xx.x.x..x.xxx.xxx.xxxxxxx.xxxxxxxxxxxxx.xxx..xxxxx..xx [ 53%]
.xxxxx.xx..xxx.xxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxx.x..x...xxxxx....x..xxx.....x.......xx...xxx..x......x.....................x...x....x.......... [ 61%]
...............x....x............x.............x..x.x......x........................x............xx.......x..........x.....x..x.............x..x.x........x...x.....x.....x.x.......x........x......... [ 70%]
...........x..x...........................x..................x...........................x....x..x................x...........x...................x....................x............................... [ 79%]
....................................................................................................................x.................................................................................. [ 88%]
.................................................x.............................................................................................................x.s....s......xs........................ [ 97%]
....s.......................x.x............................. [100%]
1757 passed, 132 skipped, 356 xfailed, 4 xpassed in 388.78s (0:06:28)
I'll try to take a look over this tomorrow!