paimon icon indicating copy to clipboard operation
paimon copied to clipboard

[WIP] Introduce RowTypeProjection for ReadBuilder

Open Zouxxyy opened this issue 1 year ago • 2 comments

Purpose

public API

public interface ReadBuilder extends Serializable {
     ReadBuilder withRowTypeProjection(RowTypeProjection rowTypeProjection);
}

public class RowTypeProjection {
    public static RowTypeProjection from(RowType rowType);
}

inner api

public class RowTypeProjection {  
    public int[] toTopLevelProjection(RowType rowType);

    // skipProjectTopLevel is introduced for compatible with the existing withProjection
    public RowType project(RowType rowType, boolean skipProjectTopLevel);
}

how to use

RowType writeType =
        DataTypes.ROW(
                DataTypes.FIELD(0, "pt", DataTypes.INT()),
                DataTypes.FIELD(1, "a", DataTypes.INT()),
                DataTypes.FIELD(2, "f0", DataTypes.INT()),
                DataTypes.FIELD(
                        3,
                        "f1",
                        DataTypes.ROW(
                                DataTypes.FIELD(4, "f0", DataTypes.INT()),
                                DataTypes.FIELD(5, "f1", DataTypes.INT()),
                                DataTypes.FIELD(6, "f2", DataTypes.INT()))));
// write
// GenericRow.of(0, 0, 0, GenericRow.of(10, 11, 12))


RowType readType =
        DataTypes.ROW(
                DataTypes.FIELD(
                        3,
                        "f1",
                        DataTypes.ROW(
                                DataTypes.FIELD(4, "f0", DataTypes.INT()),
                                DataTypes.FIELD(6, "f2", DataTypes.INT()))));

RowTypeProjection rowTypeProjection = RowTypeProjection.from(readType);

ReadBuilder readBuilder = table.newReadBuilder().withRowTypeProjection(RowTypeProjection.from(readType));

// read
// GenericRow.of(GenericRow.of(10, 12))

Tests

API and Format

Documentation

Zouxxyy avatar Sep 19 '24 05:09 Zouxxyy

We just need a pruneColumns(RowType requiredSchema).

JingsongLi avatar Sep 19 '24 13:09 JingsongLi

RowType contains all the information (field name, field id, nested structure ... ), it can replace projection

The final API will be modified to like this

    @Deprecated
    default ReadBuilder withProjection(int[] projection) {
        // projection -> requiredSchema
        return pruneColumns(RowType requiredSchema);
    }

    ReadBuilder pruneColumns(RowType requiredSchema);

Zouxxyy avatar Sep 20 '24 02:09 Zouxxyy

@JingsongLi Thanks for review, updated

Zouxxyy avatar Sep 24 '24 05:09 Zouxxyy