Antoine Pitrou
Antoine Pitrou
> I'd like to follow the testing here: https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/test/java/org/apache/parquet/hadoop/TestDataPageV1Checksums.java This actually seems more complicated to implement (you have to replicate details of the file format in the tests). Also, without...
Since the code is simple and the algorithm well-know, my vote is to own it and keep it in `util`.
Right, since the data types are supposed to match, this PR is only guarding against invalid data. It you want to make sure data is valid, you should call `Validate`...
It _does_ start with a type comparison, it's also mentioned above: https://github.com/apache/arrow/blob/d4190cc9ad15d30cb8b840f8a6df25c006d8009f/cpp/src/arrow/compare.cc#L164-L169 and you can see an example of type checking here: https://github.com/apache/arrow/blob/d4190cc9ad15d30cb8b840f8a6df25c006d8009f/cpp/src/arrow/compare.cc#L547-L554
Can you show a snippet that would show the issue?
Okay, so here is the problem: users shouldn't pass invalid data to Arrow APIs (except to `Validate` and `ValidateFull`, which are explicitly designed to handle such data). So it doesn't...
But again, during development you can call `Validate[Full]` in your own code. It doesn't really make sense to randomly add checks in Arrow functions, IMHO.
For the record, the equivalent Python functions don't allow passing `strides`.
@raulcd Would you like to rebase from master and try again?
By the way, what is the current disk footprint of our sccache?