Should we move this library back to Apache Arrow governance?
datafusion-python was donated to the Apache Arrow project in April 2021 and was added to the arrow-datafusion repository [1].
datafusion-python was removed from the repository in January 2022 [2] and added to a new repository in the datafusion-contrib organization.
I would like to propose bringing the Python bindings back under Apache governance. This will require going through the IP clearance process again, unfortunately.
I propose that we move the code to its own repository, perhaps apache/arrow-datafusion-python?
Let's use this issue and the mailing list to discuss this.
[1] https://github.com/apache/arrow-datafusion/pull/69
[2] https://github.com/apache/arrow-datafusion/pull/1518
I am curious about the rationale to bring it back into apache governance (I am not opposed, but I wonder what the benefits are). Is it related to finding more assistance to maintain the code? Or is it cumbersome to keep up with non trivial changes in DataFusion?
One reason is that I would like to help maintain the package. The proposal is not to bring it back into arrow-datafusion but into its own repo arrow-datafusion-python. I don't foresee the move having any impact on DataFusion maintainer's workload.
One reason is that I would like to help maintain the package.
I see -- as I recall your employment situation requires ASF governed projects, correct?
BTW the move makes sense to me
@andygrove only concern that i have with this is if in the future you were unable to contribute as much to the python bindings that the maintenance burden would fall to other arrow contributors who may have less motivation to maintain the python bindings which could slow down progress.
I see -- as I recall your employment situation requires ASF governed projects, correct?
It's not quite that simple. There is a process to go through before committing to an open source project and that has already been done for ASF projects so the path is much simpler. In other cases, I have had to add company copyrights to code being submitted when the governance of the project is less clear.
@andygrove only concern that i have with this is if in the future you were unable to contribute as much to the python bindings that the maintenance burden would fall to other arrow contributors who may have less motivation to maintain the python bindings which could slow down progress.
Yes, that is a good point. I suppose there is always the option to move it back out again but I would prefer to see us work towards having more committers on the project.