Raunaq Morarka comments

Results 44 comments of


                                            Raunaq Morarka

Release notes for 393

``` # ClickHouse, Druid, MariaDb, MySql, Oracle, PostgreSQL, Redshift, SingleStore, SQL server, Phoenix * Improved performance for queries with selective joins through push down of dynamic filters to the data...

Release notes for 393

``` # Hive * Support hive bucket filtering on bucketed columns of float, double, date, list, map and bounded varchar data types. ({issue}`13553`) ``` #13553 #13472

Release notes for 393

``` # Hive * Upgrade Alluxio to 2.8.1 to fix security vulnerabilities. ({issue}`13609`) ``` #13609

Scale table writers per task based on throughput

@sopel39 PTAL It lgtm % comments about docs

Scale table writers per task based on throughput

> Test failures are unrelated Please rebase to latest mater, the CI issues should be resolved now

Implement verification for optimized parquet writer

> 1. there seems to ba a lot of code copied between orc, rcfile and parquet write validation. It would be a lot cleaner to have it extracted to common...

Implement verification for optimized parquet writer

[Optimized parquet writer verification inserts benchmark.pdf](https://github.com/trinodb/trino/files/9549824/Optimized.parquet.writer.verification.inserts.benchmark.pdf) Perf impact with 5% verification (current default) is around 2-3% Perf impact with 100% verification would be around 45%.

Implement verification for optimized parquet writer

> Per [#14047 (comment)](https://github.com/trinodb/trino/issues/14047#issuecomment-1244866545) > is this enabled in Hive connector only, and Iceberg/Delta (which also use the optimizer writer), do not run the verification? Right, this PR implements parquet...

Enable Parquet reader to use SeekableInputStream

> > The bloom_filter_offset in thrift specified the "offset" of the bloomfilter header, but it does not specify the "length" of the > > Since you don't know length, how...

Enable Parquet reader to use SeekableInputStream

> @raunaqmorarka is there some performance penalty when doing streaming read APIs? There is a description of the problems encountered with streaming reads in https://trino.io/blog/2019/05/06/faster-s3-reads.html