Gang Wu

Results 304 comments of Gang Wu

Yes, the recommended approach is to reuse a single `ColumnVectorBatch` and consume it before calling `next`. The lifecycle is bound to the column reader. Maybe we should better document this.

This is a common practice that the lifecycle of the reusable batch is bound to the last reader state. The reason is that creating a batch is usually a heavy...

Right, we don't have that yet. It is recommended to reuse the batch and consume it right away.

cc @ggershinsky @shangxinli for experts on encryption

> For BYTE_ARRAY and FIXED_LEN_BYTE_ARRAY, the unscaled number must be encoded as two's complement using big-endian byte order (the most significant byte is the zeroth element) This is clear per...

Yes, those markdown files are the source of truth for specs. The site is unfortunately out of sync and we had a discussion to remove the spec from site by...

I would suggest getting sufficient feedback from the community before actually doing the work :)

@kou What do you think of this change? I've tested Apache Arrow in https://github.com/apache/arrow/pull/47792

TBH, I don't think adding some random examples would really help users because they are pretty similar to what's already in the unit test. What in my mind is something...

It might be that there are too many binary values in the array column. Perhaps you can tune the page size check to be more aggressively. See https://github.com/apache/parquet-java/blob/master/parquet-column/src/main/java/org/apache/parquet/column/ParquetProperties.java and look...