ibis icon indicating copy to clipboard operation
ibis copied to clipboard

feat(databricks): add the databricks backend

Open cpcloud opened this issue 1 year ago • 7 comments

Description of changes

Add support for the databricks backend.

Notes

  • The PySpark compiler is almost entirely reused. Naturally there are a couple cases where things differ, and the get overridden in the databricks compiler.
  • Databricks seems to be aggressive about turning SQL NULLs into NaNs, which defeats a number of array and map tests that expect None in the output from to_pandas/execute
  • Naturally, databricks pins pyarrow to <17 (one version behind the latest) and numpy to <2. It's not as bad as it was with the snowflake connector, but we shouldn't merge this until we can figure out a sane workaround to avoid the pin in CI.

Issues closed

Resolves #9248.

cpcloud avatar Sep 25 '24 14:09 cpcloud

Tests won't run until merge due to cloudiness.

I'll post the results from a local run, and then fix the CI (if needed) once this is merged.

cpcloud avatar Sep 25 '24 14:09 cpcloud

This is passing locally:

…/ibis on  databricks is 📦 v9.5.0 via 🐍 v3.12.5 via ❄️  impure (ibis-3.12.5-env)
❯ pytest -m databricks -n auto --dist loadgroup --snapshot-update -q
bringing up nodes...
xx.xxx.xsssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssx...x.sssssssssssssssssssss.........x...................x.........................x [  9%]
......x.............................x.......................................x.......................x..x.................x...............................................x..............xxx... [ 19%]
xxxxxxxxxxxxxxxxxxx.x.x.xxxxx...............x.........................x..........x.....x...........xx.....x................x..........................xx............................x....x.... [ 29%]
..........x.................x................x..x......x...x.............................xx..x...x...................x.x.........x...............x.......x.....x.x.x......xx.....xx....x...... [ 39%]
.........xx..x..x....xx.......x.x...x........xx.x.......x....................x.......x...................x..x.........x.....x.........x..sx.x....x.xs..x..s.....x..x......sx.................. [ 49%]
..................s..xx.x.....................x........x...................s...........s......................................x.s..................xx........x..x.x........x..........x....x.. [ 59%]
xx.x....x..x......x.......x.xxx.........x...x.x.xx.......................x.......x....xx.....x.....x.x.X.x....x...xX..x..xxx.x.xx......xxxx..xxx.....x...xx.x.................x..x............ [ 69%]
................................................................................................x..x..............xx..............x..xx.x......xx..........x......x....................x...... [ 79%]
.....x.....x..x.x.......................x.......x...x...xx....x.x......x.......x...x....x....x.x..................x.x.x..x......x................x............................x............... [ 89%]
..................s...........x.................s......................................................................................................................................x....x. [ 99%]
..x........                                                                                                                                                                                    [100%]
1569 passed, 130 skipped, 210 xfailed, 2 xpassed in 273.60s (0:04:33)

cpcloud avatar Oct 07 '24 14:10 cpcloud

We probably need a policy in place for what happens to backends with no local CI option if financial sponsorship for those CI runs goes away.

gforsyth avatar Oct 07 '24 15:10 gforsyth

The only reasonable policy I can think of right now is that if funding dries up, the backend goes into backend purgatory, where the CI stops running and support becomes best effort.

Thoughts?

cpcloud avatar Oct 08 '24 11:10 cpcloud

That seems reasonable to me. And I think, if that happens, we add badges to that effect in the readme and on the backend docs page, effectively severing the backend from any semantic versioning guarantees (while still attempting best efforts to maintain those guarantees)

gforsyth avatar Oct 08 '24 11:10 gforsyth

@gforsyth Where should we put this policy?

cpcloud avatar Oct 17 '24 11:10 cpcloud

@gforsyth Where should we put this policy?

How about a page in the docs in the Backends dropdown alongside the operations support matrix?

gforsyth avatar Oct 17 '24 15:10 gforsyth

@gforsyth Finally got around to adding that policy page.

cpcloud avatar Oct 31 '24 13:10 cpcloud

Databricks backend passes the backend test suite:

❯ pytest -m databricks -n auto --dist loadgroup --snapshot-update -q
bringing up nodes...
x...x.x..x...sssssssssssssssssssss..........xx.x..sssssssssssssss.ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssxx...x............xx......x........x.x.......x..x [ 10%]
x........x...x.....xx.x.x.x.........x................xx...............x.........x..x..s...........s......s...........................................................................x................. [ 20%]
........................................................................................................x....x...................x.x.....x......x.x.............x........x...................x......... [ 30%]
..x....................xx............................x.....x...........................................x..x....................................x....xxxxxxxxxxxx.........xx.x..xxx..xxx.x..x.x......... [ 41%]
................x...........x......x.......x...................x................................x....x..............x.....................x.........x..............xx......x.....xx..............x..... [ 51%]
...x.......x....x........xxxxx.xxx.xxxxx..xxxxx...xx...x..xx....x.......x............x..x.........ss....................xx..x.....x..s.s......x....x........x........x....x...........x.........x...xxx [ 61%]
xx......s.....x....x.x.x.........x...............x.x......x...x..x...x.x...X.xx.....xx........xx.x..........x..xxx........x..xx........x.x..x.....xx........X...x..............x...xx...x...........x.. [ 72%]
.............x..........................x..x....................x....x........x......x.....x.....x......x........................................x.............................xx.x....x............... [ 82%]
.....................x.....x.........x..................x...x.....x...x........x.x.x.....x..........x..........x...x...................x.....x.................................x.................s..s.. [ 92%]
...........................x................................................................................x............x......xx.x....                                                                [100%]
1584 passed, 130 skipped, 211 xfailed, 2 xpassed in 209.01s (0:03:29)

cpcloud avatar Oct 31 '24 15:10 cpcloud

Snowflake and BigQuery both passing:

…/ibis on  databricks is 📦 v9.5.0 via 🐍 v3.12.7 via ❄️  impure (ibis-3.12-env)
❯ pytest -m snowflake -n 8 --dist loadgroup --snapshot-update -q && pytest -m bigquery -n auto --dist loadgroup --snapshot-update -q
bringing up nodes...
x...........s.......................s........s............................................x.x.x..xxxxxxxxx.xx.............xx............xx.x.x..........xx.......x...............x.............x..x.x.. [  9%]
.....x...x..x....xx...........x...........x..............x...........x...x.x..x...x.....x.......xx..xx.xxx..x...x.....x....x.......................s................................................... [ 19%]
...........xxxxxxxxxxx...x.............x..x............x.xxx..xxxx.xx....xxx.x......x....x..xxx.......xx..xx.x.xx.xxx.xx...xxxxxxxxxx.xx....x.x.xx..xxx..x.xx..x.xx...xx.x.....xx.x....x........x...... [ 29%]
..............x.............xx......................x..................x...........x........x..s.xx...x...x..x....x.......xxx..x......x...x............................x.......xx............x......... [ 39%]
.....................x............x....x........x.....x..............................xx....x....x....x....x...x...............xx..x........x........................................................... [ 49%]
.......................xx......................x...............................................x.x..................................x..s..x..x....x...................x...x......................x...x. [ 59%]
......................x.............................x...............x.......................x.....x...x.........................x....x.................x...........x....x......................xx...... [ 68%]
........x.x........x.............x..............x..................x....x...........x.....x.....x........................................................x....x...................x.................... [ 78%]
......................x.....................x...................................x...................x........x.........x.x..xx........x.........x..........x...x....x...x.s.s.x......s................. [ 88%]
s.......................................................................................................................................................................................x.............. [ 98%]
...............................                                                                                                                                                                         [100%]
1785 passed, 10 skipped, 226 xfailed in 232.21s (0:03:52)
bringing up nodes...
........................s....sssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss.ssssssssssssssssssssssss....s......sssssssssssssssssssss.................s...x......x..x...... [  8%]
................................................s....x.....................x...x...........x....xxx..x..x...x..xxx.x.....x.xx......x.........x....................................................x.... [ 17%]
x......................................x.............................................x............................x.x...x...x.....x...........s..s.x...s.x...x..x..sx...............x.......xx......... [ 26%]
.....x..x.x.xx........x............Xxxx.xx........x..x.....x..................x.xxx...x.xxx.x.XxXx..x............x.x....x...x.xx....X.............xx.xx....xx..xxx..xxxx.xx..x.xx.....xx...x......x...x [ 35%]
x...x..x.....x..x.x.x..xx.....x........x.........xxxx......x....x...xxx.............x..x..x...x....x......x....x.x.x.......xx....x..x...x................xx.......x..............x.......xx............ [ 44%]
.....x.....x...x......x......x............x.x..........x...x.x..x.......x..............xx........x......xx..x..........x.............x.........x.xx.x.x..x.xxx.xxx.xxxxxxx.xxxxxxxxxxxxx.xxx..xxxxx..xx [ 53%]
.xxxxx.xx..xxx.xxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.xxxxxxxxxxxxxx.x..x...xxxxx....x..xxx.....x.......xx...xxx..x......x.....................x...x....x.......... [ 61%]
...............x....x............x.............x..x.x......x........................x............xx.......x..........x.....x..x.............x..x.x........x...x.....x.....x.x.......x........x......... [ 70%]
...........x..x...........................x..................x...........................x....x..x................x...........x...................x....................x............................... [ 79%]
....................................................................................................................x.................................................................................. [ 88%]
.................................................x.............................................................................................................x.s....s......xs........................ [ 97%]
....s.......................x.x.............................                                                                                                                                            [100%]
1757 passed, 132 skipped, 356 xfailed, 4 xpassed in 388.78s (0:06:28)

cpcloud avatar Oct 31 '24 16:10 cpcloud

I'll try to take a look over this tomorrow!

gforsyth avatar Oct 31 '24 19:10 gforsyth