Antoine Pitrou comments

Results 822 comments of


                                            Antoine Pitrou

ARROW-17904: [C++] [WIP] Parquet Implement crc in reading and writing Page for DATA_PAGE (v1)

> I'd like to follow the testing here: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestDataPageV1Checksums.java This actually seems more complicated to implement (you have to replicate details of the file format in the tests). Also, without...

ARROW-17904: [C++] [WIP] Parquet Implement crc in reading and writing Page for DATA_PAGE (v1)

Since the code is simple and the algorithm well-know, my vote is to own it and keep it in `util`.

ARROW-17964: [C++] Range data comparison for struct type may go out of bounds

Right, since the data types are supposed to match, this PR is only guarding against invalid data. It you want to make sure data is valid, you should call `Validate`...

ARROW-17964: [C++] Range data comparison for struct type may go out of bounds

It _does_ start with a type comparison, it's also mentioned above: https://github.com/apache/arrow/blob/d4190cc9ad15d30cb8b840f8a6df25c006d8009f/cpp/src/arrow/compare.cc#L164-L169 and you can see an example of type checking here: https://github.com/apache/arrow/blob/d4190cc9ad15d30cb8b840f8a6df25c006d8009f/cpp/src/arrow/compare.cc#L547-L554

ARROW-17964: [C++] Range data comparison for struct type may go out of bounds

Can you show a snippet that would show the issue?

ARROW-17964: [C++] Range data comparison for struct type may go out of bounds

Okay, so here is the problem: users shouldn't pass invalid data to Arrow APIs (except to `Validate` and `ValidateFull`, which are explicitly designed to handle such data). So it doesn't...

ARROW-17964: [C++] Range data comparison for struct type may go out of bounds

But again, during development you can call `Validate[Full]` in your own code. It doesn't really make sense to randomly add checks in Arrow functions, IMHO.

ARROW-17997: [Ruby] Add support for building Arrow::Tensor from raw nested Ruby array

For the record, the equivalent Python functions don't allow passing `strides`.

Enable Cython tests on windows wheels

@raulcd Would you like to rebase from master and try again?

ARROW-18016: [CI] Add sccache to more jobs

By the way, what is the current disk footprint of our sccache?