Gang Wu
Gang Wu
@dongjoon-hyun We have duplicate content at https://github.com/apache/orc-format/tree/main/specification. Is it inevitable to maintain the two copies unless we have a orc-site repo?
@kou Could you help review this? Thanks!
The new github action doesn't run. I found the error message below: ``` Invalid workflow file: .github/workflows/build_and_test.yml#L224 The workflow is not valid. .github/workflows/build_and_test.yml (Line: 224, Col: 7): 'run' is already...
Let me know when ready to review.
> Is this ready or do we need more revision, @ffacs and @wgtmac ? My only concern is the name of public api: https://github.com/apache/orc/pull/2269/files#r2184135226. It is better to use `getCrs()`...
@j2cms If you have an installed protobuf somewhere, perhaps you may want to set one of these environment variables before building to avoid the default one.
I'm not sure if such fallback makes sense because it may slightly produce data shift for timestamp values written by a different host. I also see that the fallback exists...
The problem may come from the writer which blindly uses UTC to serialize timestamp values. In terms of fallback, I think it is better to make the fallback value configurable...
@clairemcginty I'm not sure whether the `size` filter could leverage [SizeStatistics](https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L202) which provides the histograms of def & rep levels. (cc @emkornfield)
Any plan to add PoC implementation to arrow-cpp or arrow-rs to meet the 2 impls requirement? @sfc-gh-yzou