Dynamic schema for custom TableProvider
Is your feature request related to a problem or challenge?
Hi team,
We have been exploring DataFusion to provide unified access to all our various data sources. For cases where we implement a custom TableProvider that sources a generic web service that fetches data through multiple tables/databases depending on the GraphQL request, how can we make the schema generic as well so users can select any fields from all tables the said API has?
Describe the solution you'd like
No response
Describe alternatives you've considered
One alternative I thought is to simply create one custom TableProvider per table using the same API, but only that each will have different request patterns to reach out to separate table sources. However, this requires additional overhead of having to define all tables the generic API has.
Additional context
No response
how can we make the schema generic as well so users can select any fields from all tables the said API has?
One way you might be able to do this would be creating a SchemaProvider that dynamically configured and passed back different TableProvider instances based on what API was requested 🤔
@alamb - Thanks for responding. This sounds interesting. Could you elaborate on how this can be achieved?
- What information gets passed down to
SchemaProviderthat I can use to determine whichTableProviderinstance to pass? - Using a simple example request, "ctx.sql('SELECT * FROM table_a')", how can
SchemaProviderreturn the rightTableProviderbased on this user request?
@alamb - Thanks for responding. This sounds interesting. Could you elaborate on how this can be achieved?
What information gets passed down to SchemaProvider that I can use to determine which TableProvider instance to pass?
According to https://docs.rs/datafusion/latest/datafusion/catalog/schema/trait.SchemaProvider.html it gets a table name.
Using a simple example request, "ctx.sql('SELECT * FROM table_a')", how can SchemaProvider return the right TableProvider based on this user request?
I don't fully understand your usecase
a generic web service that fetches data through multiple tables/databases depending on the GraphQL request
Perhaps you could make something like
let request = ...; // request with graphql context
// register a table provider that has access to the context
let table = MyTableProvider::new(request)
// register `table_a` for sql
ctx.register_table("table_a", Arc::new(table))
// now accessing table_a from SQL will access a `MyTableProvider` that has access to the request
ctx.sql('SELECT * FROM table_a')...
You could potentially use a schema provider to avoid having to register all tables for each request, and only instantiate the table providers when they are required
Thanks @alamb. I think what you proposed in your example is something we could leverage. We essentially needed a way to register tables dynamically based on what the user requested instead of registering all tables available statically upfront, i.e during service startup.