Gang Wu comments

Results 304 comments of


                                            Gang Wu

GH-3083: Make DELTA_LENGTH_BYTE_ARRAY default encoding for binary

@raunaqmorarka You can send an email to [email protected] to subscribe. If you don't want to subscribe, you may directly send an email to [email protected]. You can see https://lists.apache.org/[email protected] for reference.

GH-43994: [C++][Parquet] Fix schema conversion from two-level encoding nested list

@emkornfield @pitrou @mapleFU Would you mind taking a look? Thanks!

GH-43994: [C++][Parquet] Fix schema conversion from two-level encoding nested list

``` optional group a (LIST) { repeated group array (LIST) { repeated int32 array; } } ``` IMO, the root cause is that the current code recognizes the schema above...

GH-43994: [C++][Parquet] Fix schema conversion from two-level encoding nested list

> Our `ListToSchemaField` is like this part of the code https://github.com/apache/parquet-java/blob/aec7bc64dffa373db678ab2fc8b46565b4c011a5/parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java#L397-L421 > > Should we port the impl and testings in that? I think we are just missing check of...

GH-43994: [C++][Parquet] Fix schema conversion from two-level encoding nested list

I‘m using Hive schema, so that's why it is `array`. The file could be easily produced by Spark Sql like below: ``` package org.example import org.apache.spark.sql.SparkSession object ParquetTwoLevelList { def...

GH-43994: [C++][Parquet] Fix schema conversion from two-level encoding nested list

I will try to use parquet-java to create a minimal file and add it to parquet-testing. The file created by Hudi is too large due to a file-level bloom filter...

GH-43994: [C++][Parquet] Fix schema conversion from two-level encoding nested list

Gentle ping :) @emkornfield @pitrou @mapleFU

GH-43994: [C++][Parquet] Fix schema conversion from two-level encoding nested list

@emkornfield Thanks for your review! I've rebased it and the test failure in `R / rhub/ubuntu-gcc12:latest` is unrelated (observed the same error from other PRs). I'll merge it.

GH-1452: implement Size() filter for repeated columns

Thanks for adding this! This is a large PR that I need to take some time to review. It would be good if @emkornfield @gszadovszky could take a look to...

GH-1452: implement Size() filter for repeated columns

BTW, the level histogram might not be available when max_level is 0 because there is only single level (i.e. 0) and its count can be deduced from `num_values` of the...