trino icon indicating copy to clipboard operation
trino copied to clipboard

Add new table handle interface to expose common datalake-type info

Open marton-bod opened this issue 9 months ago • 4 comments

Proposal is to have a new interface, DataLakeTableHandle, which extends ConnectorTableHandle and will be implemented by the Iceberg/Delta/Hudi/Hive table handles. This new table handle interface would expose some common shared concepts such as:

    SchemaTableName getSchemaTableName();

    Set<String> getPredicateColumns();

    default boolean supportsPartitioning()
    {
        return false;
    }

    default Set<String> getPartitionColumns()
    {
        return Set.of();
    }

    default boolean supportsSorting()
    {
        return false;
    }

    default Set<String> getSortColumns()
    {
        return Set.of();
    }

    default boolean supportsStats()
    {
        return false;
    }

    default TableStatistics getTableStats()
    {
        return TableStatistics.empty();
    }

One use case is to then include some of this information into QueryCompletedEvent/TableInfo, unlocking the opportunity to perform more in-depth offline analyses and provide recommendations to users (e.g. compare most frequently-used predicate columns to the partitioning/sorting scheme)

marton-bod avatar May 23 '24 10:05 marton-bod