feat(api): add `FileTable`
This is intended to model a "table" that is actually a collection of files (local or remote), which is more common in things that look like "query engines" (e.g. Substrait, Acero, Pandas).
Substrait integration is the specific purpose here, but the definition here is much simplified compared to Substrait's. In particular: globs are not modeled, and all files are assumed to be of the same type.
Would it be useful to implement this in a backend (e.g. DuckDB)? Effectively it would differ by not defining a view, instead inlining the table definition into the final query (so not really a difference to the user).
Test Results
6 files 6 suites 3m 14s :stopwatch: 3 121 tests 3 047 :heavy_check_mark: 74 :zzz: 0 :x: 18 726 runs 18 282 :heavy_check_mark: 444 :zzz: 0 :x:
Results for commit c37beb5f.
:recycle: This comment has been updated with latest results.
Codecov Report
Merging #4293 (c37beb5) into master (3fe3fd8) will increase coverage by
10.94%. The diff coverage is70.37%.
@@ Coverage Diff @@
## master #4293 +/- ##
===========================================
+ Coverage 81.59% 92.54% +10.94%
===========================================
Files 180 180
Lines 20352 20433 +81
Branches 2905 2927 +22
===========================================
+ Hits 16606 18909 +2303
+ Misses 3345 1149 -2196
+ Partials 401 375 -26
| Impacted Files | Coverage Δ | |
|---|---|---|
| ibis/backends/pandas/execution/generic.py | 89.12% <39.28%> (-2.24%) |
:arrow_down: |
| ibis/backends/pandas/__init__.py | 80.50% <66.66%> (-2.02%) |
:arrow_down: |
| ibis/expr/operations/relations.py | 97.43% <91.66%> (+6.17%) |
:arrow_up: |
| ibis/expr/format.py | 92.97% <100.00%> (+3.39%) |
:arrow_up: |
| ibis/expr/rules.py | 90.42% <100.00%> (+14.51%) |
:arrow_up: |
| ibis/backends/base/sql/alchemy/registry.py | 94.05% <0.00%> (+0.69%) |
:arrow_up: |
| ibis/expr/operations/generic.py | 95.12% <0.00%> (+0.81%) |
:arrow_up: |
| ibis/backends/base/__init__.py | 83.41% <0.00%> (+1.00%) |
:arrow_up: |
| ibis/expr/types/strings.py | 92.71% <0.00%> (+1.98%) |
:arrow_up: |
| ibis/expr/types/numeric.py | 99.19% <0.00%> (+2.40%) |
:arrow_up: |
| ... and 55 more |
Would it be useful to implement this in a backend (e.g. DuckDB)? Effectively it would differ by not defining a view, instead inlining the table definition into the final query (so not really a difference to the user).
We should definitely try to integrate it with at least one of the file based backends to see how well would it fit, one good candidate is duckdb but datafusion and pandas should support these operations as well.
I don't have time to push this forward right now, will re-open once I do