mwish
mwish
The patch is ready to be review now, though the `parquet-testing` should be merged. As https://github.com/apache/parquet-testing/pull/29 says, I add 3 files there, if more cases is required, please tell me....
@pitrou I've update the parquet-testing to latest, and the pull request can be review now
``` Found the following (possibly) invalid URLs: URL: https://arrow.apache.org/docs/r/articles/datasets.html From: man/open_dataset.Rd Status: 404 Message: Not Found URL: https://arrow.apache.org/docs/r/articles/read_write.html From: man/read_json_arrow.Rd Status: 404 Message: Not Found ``` I found CI for...
> Could you open a new GitHub issue for it? We don't need to fix it in this pull request. Hi kou, I create an issue here: https://github.com/apache/arrow/issues/14884 @kou And...
@pitrou Mind take a look?
Hi, as https://github.com/apache/arrow/pull/14351#discussion_r1046105723 says, I'd like to keep this patch minimal, so only DATA_PAGE_V1 would be checksum here. Currently, for DATA_PAGE_V2, I left `TODO` here.
> Would you be willing to submit a v2 PR later? It would be a pity if we supported v1 pages better than v2 pages. Sure, seems that v2 it's...
> It would probably be easy to find out, in any case. @pitrou You're right, I made a mistake and misundering the format here. The `data` format in memory is...
@pitrou most comment are solved, you can take a look. `DATA_PAGE_V2` will be add in the comming patch. And I have a question here: https://github.com/apache/parquet-format/pull/126#issuecomment-1348324323 . Should I implement CRC...
After discussion on `parquet-format`, the checksum for DICT and DATA_PAGE_V2 will be implemented in the coming patches.