arrow issues

[GLib] Add support for arrow::DictionaryBuilder

2

**Reporter**: [Kouhei Sutou](https://issues.apache.org/jira/browse/ARROW-4919) / @kou **Note**: *This issue was originally created as [ARROW-4919](https://issues.apache.org/jira/browse/ARROW-4919). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*

asfimport

Type: enhancement

Component: GLib

[C++][Python] Set up testing for backwards compatibility of the parquet reader

1

Given the recent parquet compat problems, we should have better testing for this. For easy testing of backwards compatibility, we could add some files (with different types) written with older...

asfimport

Component: C++

Component: Python

Type: test

Status: stale-warning

[C++][Parquet] Add `Result<shared_ptr>` versions of `parquet:arrow::FileReader::GetRecordBatchReader()`

1

### Describe the enhancement requested The `unique_ptr` versions of `parquet:arrow::FileReader::GetRecordBatchReader()` already added the `Result` version, deprecated `Status` versions and removed deprecated `Status` versions. We should do this for `shared_ptr` versions...

kou

Type: enhancement

Component: Parquet

Component: C++

[C++] Support for textual, JSON schema representation

18

Currently, Arrow has no textual representation for its schema that could serve the same purposes as JSON-Schema for JSON, the .proto files for Protobuf, etc. This issue is about adding...

asfimport

Type: enhancement

Component: C++

[C++][Parquet] Schema resolution: map encoding

1

Related: PARQUET-441 and PARQUET-442 **Reporter**: [Wes McKinney](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=wesm) / @wesm **Note**: *This issue was originally created as [PARQUET-443](https://issues.apache.org/jira/browse/PARQUET-443). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*

asfimport

Type: enhancement

Component: Parquet

Component: C++

Priority: Major

Status: stale-warning

[C++][Parquet] Develop external predicate pushdown API for column readers

1

This will happen significantly downstream of where we are at right now, but we should be planning ahead to facilitate scanning Parquet files with externally-defined predicates as a primary use...

asfimport

Type: enhancement

Component: Parquet

Component: C++

Priority: Major

Status: stale-warning

[C++][Parquet] Add ability to parse nested schemas from text specification like parquet-mr

1

related to PARQUET-442 **Reporter**: [Wes McKinney](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=wesm) / @wesm **Note**: *This issue was originally created as [PARQUET-510](https://issues.apache.org/jira/browse/PARQUET-510). Please see the [migration documentation](https://issues.apache.org/jira/browse/PARQUET-2502) for further details.*

asfimport

Type: enhancement

Component: Parquet

Component: C++

Priority: Minor

Status: stale-warning

[C++][Parquet] Determine a good default page size

1

Currently we have 1MB as the default data page size in parquet-cpp as in parquet-mr. We should communicate with the other parquet implementations if this is a good value and...

asfimport

Type: enhancement

Component: Parquet

Component: C++

Priority: Major

Status: stale-warning

[C++][Parquet] Remove redundant total_byte_size calculation

1

The `total_byte_size` of a rowgroup is being redundantly computed. Use `total_bytes_written_` passed by the writer instead. https://github.com/apache/parquet-cpp/blob/master/src/parquet/file/metadata.cc#L471 **Reporter**: [Deepak Majeti](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=mdeepak) / @majetideepak **Note**: *This issue was originally created as [PARQUET-730](https://issues.apache.org/jira/browse/PARQUET-730)....

asfimport

Type: bug

Component: Parquet

Component: C++

Priority: Major

Status: stale-warning

[Python] Implement conversion between integer coded as floating points with NaN to an Arrow integer type

10

For example: if pandas has casted integer data to float, this would enable the integer data to be recovered (so long as the values fall in the ~2^53 floating point...

asfimport

Type: enhancement

Component: Python

Status: stale-warning

arrow
arrow copied to clipboard

Metadata

[GLib] Add support for arrow::DictionaryBuilder

[C++][Python] Set up testing for backwards compatibility of the parquet reader

[C++][Parquet] Add `Result<shared_ptr>` versions of `parquet:arrow::FileReader::GetRecordBatchReader()`

[C++] Support for textual, JSON schema representation

[C++][Parquet] Schema resolution: map encoding

[C++][Parquet] Develop external predicate pushdown API for column readers

[C++][Parquet] Add ability to parse nested schemas from text specification like parquet-mr

[C++][Parquet] Determine a good default page size

[C++][Parquet] Remove redundant total_byte_size calculation

[Python] Implement conversion between integer coded as floating points with NaN to an Arrow integer type

← Metadata

Owner

Metadata

arrow arrow copied to clipboard

Metadata

← Metadata

Owner

Metadata

arrow
arrow copied to clipboard