gravitino icon indicating copy to clipboard operation
gravitino copied to clipboard

[FEATURE] trino connector support more Iceberg partitions

Open FANNG1 opened this issue 1 year ago • 5 comments

Describe the feature

trino connector support more Iceberg partitions

Motivation

For a table created with non identify partition, trino failed to query the data.

create table abcd(a int, b int) partitioned by (bucket(2,a))  TBLPROPERTIES ('format-version'='2', 'write.merge.mode'='merge-on-read', 'write.delete.mode'='merge-on-read');

Query 20240829_032202_00093_fk3q7 failed: class org.apache.gravitino.rel.expressions.transforms.Transforms$BucketTransform cannot be cast to class org.apache.gravitino.rel.expressions.transforms.Transform$SingleFieldTransform (org.apache.gravitino.rel.expressions.transforms.Transforms$BucketTransform and org.apache.gravitino.rel.expressions.transforms.Transform$SingleFieldTransform are in unnamed module of loader io.trino.server.PluginClassLoader @366fd3cb)

Describe the solution

No response

Additional context

No response

FANNG1 avatar Aug 29 '24 03:08 FANNG1

The table's metadata is:

{
    "code": 0,
    "table":
    {
        "name": "abcd",
        "columns":
        [
            {
                "name": "a",
                "type": "integer",
                "nullable": true,
                "autoIncrement": false
            },
            {
                "name": "b",
                "type": "integer",
                "nullable": true,
                "autoIncrement": false
            }
        ],
        "properties":
        {
            "owner": "root",
            "write.merge.mode": "merge-on-read",
            "current-snapshot-id": "3663013567918800433",
            "write.delete.mode": "merge-on-read",
            "provider": "iceberg",
            "write.parquet.compression-codec": "zstd",
            "format": "iceberg/parquet",
            "format-version": "2",
            "location": "hdfs://10.20.31.19:9000/user/iceberg-jdbc/warehouse/mydatabase/abcd",
            "write.distribution-mode": "none"
        },
        "audit":
        {
            "creator": "anonymous",
            "createTime": "2024-08-29T03:19:55.008771798Z"
        },
        "distribution":
        {
            "strategy": "none",
            "number": 0,
            "funcArgs":
            []
        },
        "sortOrders":
        [],
        "partitioning":
        [
            {
                "strategy": "bucket",
                "numBuckets": 2,
                "fieldNames":
                [
                    [
                        "a"
                    ]
                ]
            }
        ],
        "indexes":
        []
    }
}

diqiu50 avatar Aug 29 '24 06:08 diqiu50

Trino only support the partitioning like this patten partitioning = ARRAY['c1', 'c2'] The table can be show by the iceberg connector catalog

 CREATE TABLE iceberg.mydatabase.abcd (
    a integer,
    b integer
 )
 WITH (
    format = 'PARQUET',
    format_version = 2,
    location = 'hdfs://10.20.31.19:9000/user/iceberg-jdbc/warehouse/mydatabase/abcd',
    partitioning = ARRAY['bucket(a, 2)']
 )

diqiu50 avatar Aug 29 '24 07:08 diqiu50

@mchades @yuqi1129 How can we solve the problem of the Transform expression parser to handle the string bucket(a, 2) and Transform

diqiu50 avatar Aug 29 '24 08:08 diqiu50

At least, we need to fix the query issues.

diqiu50 avatar Aug 29 '24 08:08 diqiu50

@mchades @yuqi1129 How can we solve the problem of the Transform expression parser to handle the string bucket(a, 2) and Transform

Trino Iceberg connector supports this, I think you can reference its codes. Here is the doc: https://trino.io/docs/current/connector/iceberg.html#partitioned-tables

mchades avatar Aug 29 '24 10:08 mchades