Raunaq Morarka

Results 11 issues of Raunaq Morarka

``` io.trino.plugin.deltalake.TestDeltaLakeCreateTableStatistics.testMultiFileTable Time elapsed: 1.888 s

test

## Description Implements verification of file footer, row count and checksum of columns. Added a config `parquet.optimized-writer.validation-percentage` and session property in hive connector to control the percentage of written files...

cla-signed
docs
tests:hive
performance

## Description Adds metrics for page filter and projection execution time to operator metrics which are available in EXPLAIN ANALYZE VERBOSE. ## Non-technical explanation Add metrics for filter and projection...

cla-signed
tests:hive
performance

``` 2023-08-16T11:39:56.1182633Z [ERROR] io.trino.plugin.iceberg.TestIcebergV2.testOptimizeDuringWriteOperations -- Time elapsed: 12.72 s

bug

## Description This optimizes evaluation of simple filters by applying columnar evaluation to sub-expressions within the filters. It allows usage of dictionary/rle block aware processing and unwrapping of lazy blocks...

cla-signed
performance
hive

## Description Uses the new incubating Java Vector API to improve decoding performance for some encodings in parquet reader. Vectorized decoding is used only when the preferred vector bit size...

cla-signed
docs
performance
hudi
iceberg
delta-lake
hive

Implement the TODO here: https://github.com/trinodb/trino/blob/master/plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java#L542 Here's an example from Hive: https://github.com/trinodb/trino/blob/master/plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveMetadata.java#L2942 The expected benefit is that it could potentially allow the same optimisations we have today for bucketed hive tables...

performance
iceberg

## Description Optimize writing RLE runs in parquet column descriptors Use information about nullability of Blocks to write RLE runs for repetition and definition levels more efficiently in parquet writer...

cla-signed

## Description Reduces memory usage by keeping only the required columns metadata in memory. Look-up of ColumnChunkMetadata is now through a Map rather than repeatedly iterating over all columns metadata...

cla-signed
performance
iceberg
delta-lake
hive

## Description Dynamic row filtering performs fine-grained filtering of rows in the scan operator, thus greatly improving performance of some queries. So far dynamic filters have been pushed into connectors...

cla-signed
performance
hive