Gang Wu

Results 304 comments of Gang Wu

This would complicate the implementation and result in large memory footprint. Does it make sense to use multiple file writers instead?

> The reason to prefer "extension types" over "third-party types" or "external types" is that at some point some of them might get standardized inside Parquet, like [Arrow does](https://arrow.apache.org/docs/format/CanonicalExtensions.html). >...

``` [INFO] Rat check: Summary over all files. Unapproved: 1, unknown: 1, generated: 0, approved: 28 licenses. Warning: Files with unapproved licenses: BinaryProtocolExtensions.md [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------...

As the vote has passed, I will merge it if no objection or feedback received before Sep 12.

> @wgtmac are we merging this? I want to make sure @pitrou is happy with the change.

IIUC, there are still some gaps to totally remove Hadoop dependency. At least I have to depend on `hadoop-client-api` to make build happy. cc @amousavigourabi @Fokko for advice.

Thanks for the clarification! I agree with you that I have ran into same issue. It seems that removing Hadoop dependency is only partially implemented. I need more time to...

> I think the test failure was a blip, @Fokko can you retrigger it? Done! Let's see if it can succeed.

I think there isn't any spec change since the last release. Perhaps we can wait for recent discussions (IEEE754 order, int96 stats, interval types, etc.) to close before releasing the...

Thanks for opening the issue! I think the current file size is not an issue as we have delta encoding. The problems of adding offset to row group metadata I...