datafusion
datafusion copied to clipboard
Bump arrow2
Which issue does this PR close?
Closes #2709.
Rationale for this change
Current branch arrow2
is still using arrow2 version v0.10
and falls far behind the latest version v0.12
What changes are included in this PR?
Dependencies upgraded:
- arrow2:
v0.10
->v0.12
- parquet2:
v0.12
->v0.13
- arrow-format:
0.4
->0.6
- prost:
0.9
->0.10
- prost-types:
0.9
->0.10
- tonic: 0.6 ->
0.7
- topic-build:
0.6
->0.7
API change:
-
arrow2::error::ArrowError
->arrow2::error::Error
- arrow2 FileWriter now accepts
Vec<Vec<Encoding>>
instead ofVec<Encoding>
andtransverse
can be used to create encodings from schema with a customized mapping. - arrow2
RowGroupMetadata
no longer as acolumn(usize)
method to fetch column metadata at given index, instead, it provides acolumns()
method that returns a column slice. -
datafusion::dataframe::DataFrame::write_parquet
method now acceptsarrow::io::parquet::write::WriteOptions
instead ofparquet::write::WriteOptions
- some other minor changes.
Are there any user-facing changes?
Yes, please refer to previous section for full API change list.
But I think it's rather easy for users of arrow2
branch to adapt to these changes.
About tests
Some tests are failling in current arrow2 branch, so I'm not going to fix all failing tests in this PR, instead I'll make sure no more unit test fails. I'd be happy to fix all failing tests after this PR is merged so I don't have to fix them twice :)
@v0y4g3r I'd like to start understanding more about the efforts here. Is the plan to put this behind a cargo feature
so that we can switch between arrow
and arrow2
?
@v0y4g3r I'd like to start understanding more about the efforts here. Is the plan to put this behind a cargo
feature
so that we can switch betweenarrow
andarrow2
?
No, it's simply about upgrading dependencies.
But I'm also interested in something like defining a trait to hide the API differences between arrow and arrow2 and use cargo feature to provide an option. But I have no ideas on how to implement that now.
There are some license headers missing:
NOT APPROVED: ballista/rust/core/src/memory_stream.rs (./ballista/rust/core/src/memory_stream.rs): false NOT APPROVED: datafusion/src/physical_plan/sort.rs (./datafusion/src/physical_plan/sort.rs): false NOT APPROVED: datafusion/tests/sql_integration.rs (./datafusion/tests/sql_integration.rs): false NOT APPROVED: datafusion-physical-expr/src/test_util.rs (./datafusion-physical-expr/src/test_util.rs): false
Seems like current arrow2 branch doesn't have license header. But it's ok I added headers.
@andygrove Wondering what I still need to do in order to get this merged and I'd be happy to follow up.
@andygrove Seems like some workflows need a maintainer's approval to run.
@v0y4g3r looks like there are some minor build issues that need to fixed.
Marking as draft to note it is not ready for merging -- please change it back to ready for review when it is ready. Thanks!
@v0y4g3r looks like there are some minor build issues that need to fixed.
These tests are failing in current arrow2 branch, and is not introduced by this PR.
Anyway, I'm working on fixing these tests in another pull request, but there's something different between arrow and arrow2.
Marking this PR as a draft as I am cleaning up the review queue. Please mark it as ready for review when it is
This PR is more than 6 month old, so closing it down for now to clean up the PR list. Please reopen if this is a mistake and you plan to work on it more