iceberg icon indicating copy to clipboard operation
iceberg copied to clipboard

Optimise RowData evolution

Open aiborodin opened this issue 6 months ago • 1 comments

RowDataEvolver recomputes Flink RowType and field getters for every input record that needs to match a destination Iceberg table schema. Cache field getters and column converters to optimise RowData conversion.

aiborodin avatar Jun 18 '25 05:06 aiborodin

According to the profile in my previous comment https://github.com/apache/iceberg/pull/13340#discussion_r2156145524, schema caching would not be sufficient and we also need to cache field accessors and converters to minimise the CPU overhead. The object overhead is minimal as each converter would only store filed accessors and conversion lambdas. The cache overhead is minimal because it is an identity cache and same schema objects are already cached in TableMetadataCache.

aiborodin avatar Jun 19 '25 09:06 aiborodin

Nice last commits 😂

mxm avatar Jun 26 '25 15:06 mxm

Merged to main. Thanks for the optimization @aiborodin and @mxm for the review.

@aiborodin: Could you please create a backport PR to port these changes to Flink 1.20, 1.19. This sed command could help:

g diff HEAD~1 HEAD flink/v2.0 |sed "s/v2.0/v1.20/g">/tmp/patch

Also, you need to change anything above cleanly applying the change, please highlight, so it is easier to review.

Thanks for all of your work on this! Happy to have you as a contributor!

pvary avatar Jun 26 '25 19:06 pvary

Thank you for merging and reviewing the change @pvary! I appreciate your and @mxm's valuable feedback, and it's a pleasure to have you as reviewers. I raised this PR to backport the changes to Flink 1.19 / 1.20: https://github.com/apache/iceberg/pull/13401.

aiborodin avatar Jun 27 '25 06:06 aiborodin