trino
trino copied to clipboard
Add new table handle interface to expose common datalake-type info
Proposal is to have a new interface, DataLakeTableHandle
, which extends ConnectorTableHandle
and will be implemented by the Iceberg/Delta/Hudi/Hive table handles. This new table handle interface would expose some common shared concepts such as:
SchemaTableName getSchemaTableName();
Set<String> getPredicateColumns();
default boolean supportsPartitioning()
{
return false;
}
default Set<String> getPartitionColumns()
{
return Set.of();
}
default boolean supportsSorting()
{
return false;
}
default Set<String> getSortColumns()
{
return Set.of();
}
default boolean supportsStats()
{
return false;
}
default TableStatistics getTableStats()
{
return TableStatistics.empty();
}
One use case is to then include some of this information into QueryCompletedEvent/TableInfo, unlocking the opportunity to perform more in-depth offline analyses and provide recommendations to users (e.g. compare most frequently-used predicate columns to the partitioning/sorting scheme)