arrow
arrow copied to clipboard
Apache Arrow is a multi-language toolbox for accelerated data interchange and in-memory processing
**Reporter**: [Kouhei Sutou](https://issues.apache.org/jira/browse/ARROW-4919) / @kou **Note**: *This issue was originally created as [ARROW-4919](https://issues.apache.org/jira/browse/ARROW-4919). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*
Given the recent parquet compat problems, we should have better testing for this. For easy testing of backwards compatibility, we could add some files (with different types) written with older...
### Describe the enhancement requested The `unique_ptr` versions of `parquet:arrow::FileReader::GetRecordBatchReader()` already added the `Result` version, deprecated `Status` versions and removed deprecated `Status` versions. We should do this for `shared_ptr` versions...
Currently, Arrow has no textual representation for its schema that could serve the same purposes as JSON-Schema for JSON, the .proto files for Protobuf, etc. This issue is about adding...
Related: PARQUET-441 and PARQUET-442 **Reporter**: [Wes McKinney](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=wesm) / @wesm **Note**: *This issue was originally created as [PARQUET-443](https://issues.apache.org/jira/browse/PARQUET-443). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*
This will happen significantly downstream of where we are at right now, but we should be planning ahead to facilitate scanning Parquet files with externally-defined predicates as a primary use...
related to PARQUET-442 **Reporter**: [Wes McKinney](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=wesm) / @wesm **Note**: *This issue was originally created as [PARQUET-510](https://issues.apache.org/jira/browse/PARQUET-510). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*
Currently we have 1MB as the default data page size in parquet-cpp as in parquet-mr. We should communicate with the other parquet implementations if this is a good value and...
The `total_byte_size` of a rowgroup is being redundantly computed. Use `total_bytes_written_` passed by the writer instead. https://github.com/apache/parquet-cpp/blob/master/src/parquet/file/metadata.cc#L471 **Reporter**: [Deepak Majeti](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mdeepak) / @majetideepak **Note**: *This issue was originally created as [PARQUET-730](https://issues.apache.org/jira/browse/PARQUET-730)....
For example: if pandas has casted integer data to float, this would enable the integer data to be recovered (so long as the values fall in the ~2^53 floating point...