Automatically register tables if env var specified
Which issue does this PR close?
Closes #500.
Rationale for this change
Described in issue.
What changes are included in this PR?
- upgrade of datafusion & arrow
- config plumbing
- conversion of
default_session_builderto a supserset of functionality (now with state) - default table factories
- a bunch of flight sql work for database introspection (more required in future PRs)
Are there any user-facing changes?
@andygrove this allows users to see tables in ballista from DataGrip (and probably tableau):

@yahoNanJing if you have a minute, I'd appreciate your feedback as well.
I'm excited about this feature! I will test this out later this week.

We can now see columns
@stuartcarnie you might be interested in this as well. I can start copying you on FlightSQL related PRs if you like.
@avantgardnerio great, thank you!
I tried this out and did run into an issue when connecting from DataGrip:
INTERNAL: Failed to create SessionContext: Context("Could not create table for file:///mnt/bigdata/tpch/sf10-parquet/partsupp.parquet at /home/andy/.cargo/git/checkouts/arrow-datafusion-71ae82d9dec9a01c/7b5842b/datafusion/core/src/catalog/listing_schema.rs:116", ObjectStore(Generic { store: "LocalFileSystem", source: UnableToReadBytes { source: Os { code: 21, kind: IsADirectory, message: "Is a directory" }, path: "/mnt/bigdata/tpch/sf10-parquet/partsupp.parquet" } })).
The code assumes that files in DATAFUSION_CATALOG_LOCATION will be single files and not directories containing files.
I tried this out
Thank you so much for your review! I realize you cloned and did manual testing and that probably took a fair bit of time, which I really appreciate :)
did run into an issue
It definitely works with csvs with headers, and there's a test for that. I'll look into parquet directory support tomorrow. If it's not a big deal, let's get it in this PR. If it isn't trivial though, I'd prefer to document it in another issue and get this PR merged - with the logic that getting the framework in place for auto-registration is the bigger deal, and once it's in we can extend it for all the permutations of parameters.
great feature! 👍
Is it possible to support custom CatalogProvider in the next step?
Thank you so much for your review! I realize you cloned and did manual testing and that probably took a fair bit of time, which I really appreciate :)
Yes, this is one of the challenges with this project. Many PRs need to be tested end-to-end and this does take time, which is why I mostly only review larger PRs at weekends.
support custom
CatalogProvider
@r4ntix could you explain your use-case? It may already be covered. For example, when this gets merged, deltatables should work out of the box, as long as there are TableProviderFactorys registered for them.
support custom
CatalogProvider@r4ntix could you explain your use-case? It may already be covered. For example, when this gets merged, deltatables should work out of the box, as long as there are
TableProviderFactorys registered for them.
@avantgardnerio according to the current implementation, the automatically registered tables are put into the datafusion.default.{table} datafusion schema.
In my case, for tenant isolation, I need to be able to dynamically register tables with different schema or catalog, eg: datafusion.{tenant}.{table} or {tenant}.default.{table}.
So is it possible to support CatalogProvider in SessionBuilder to support such case?
is it possible to support CatalogProvider in SessionBuilder
I don't see why not. You could follow the patterns in this PR but with a CatalogProvider instead.
is it possible to support CatalogProvider in SessionBuilder
I don't see why not. You could follow the patterns in this PR but with a CatalogProvider instead.
Ok, I will try submit a RP for this, thx