Antoine Pitrou comments

Results 823 comments of


                                            Antoine Pitrou

ARROW-17798: [C++][Parquet] Add DELTA_BINARY_PACKED encoder to Parquet writer

I'm glad this is in! Congrats!

ARROW-17481: [C++][Python] Major performance improvements to CSV reading from S3

I didn't look at this as promised. However, I also noticed a potential problem with the current generator usage in the CSV reader that needs to be investigated: https://github.com/apache/arrow/issues/14792

ARROW-17932: [C++] Implement streaming RecordBatchReader for JSON

By the way, can you also: 1) rebase/merge from master 2) add documentation for the new class in https://github.com/apache/arrow/blob/master/docs/source/cpp/json.rst and https://github.com/apache/arrow/blob/master/docs/source/cpp/api/formats.rst#line-separated-json ?

ARROW-17351: [C++][Compute] Implement a parser for Expressions

@NoahFournier Sorry. The project is lacking review bandwidth at the moment, so we have to prioritize work and this might unfortunately take some time.

ARROW-14656: [Python] Add sort helper function for Array, ChunkedArray and StructArray

> Are you saying we should get those done first and measure it before we think about merging this in case it turns out to be slower than the current...

ARROW-14656: [Python] Add sort helper function for Array, ChunkedArray and StructArray

For more context, the original Table/Batch sort was based on something similar to NestedValuesComparator, and I significantly improved its performance by switching to per-column sorting for cases with few keys....

ARROW-14656: [Python] Add sort helper function for Array, ChunkedArray and StructArray

I disagree about introducing non-trivial C++ code that is redundant with existing code _and_ probably sub-performant.

ARROW-14656: [Python] Add sort helper function for Array, ChunkedArray and StructArray

To be clear, the current multi-column sorting facilities don't handle recursive nesting, only the top level of nesting. To handle recursive nesting, you need to flatten the columns. This can...

ARROW-14656: [Python] Add sort helper function for Array, ChunkedArray and StructArray

But I agree with quickly benchmarking the approach in this PR compared to the existing facility. It should be easy: 1) generate a RecordBatch (respectively Table); 2) convert it to...

ARROW-16212: [C++][Python] Register Multiple Kernels for a UDF

I have two issues here: 1. The API is ugly. 2. One cannot register different implementations for different input types. This might be an annoying limitation (and/or a performance issue...