[EPIC] Tasks for a new Top Level Apache Project
Is your feature request related to a problem or challenge?
Update: the new project was approved by the board and we are now in the process of setting it all up
We are on track to have a new top level apache project for DataFusion https://github.com/apache/arrow-datafusion/discussions/6475 (see Proposal Document for more details)
Once this is approved by the ASF board, we will have some logistical things to change. This ticket tracks the work items
Describe the solution you'd like
Rename github repositories
- [x] https://issues.apache.org/jira/browse/INFRA-25726
Note it is critical in my opinion that all existing links continue to work (as in the url https://github.com/apache/arrow-datafusion/issues/9691 will be redirected to the new repo). I believe this is the default behavior in Github when a repository is renamed but we should double check.
- https://github.com/apache/arrow-datafusion/ →
apache/datafusion - https://github.com/apache/arrow-datafusion-python →
apache/datafusion-python - https://github.com/apache/arrow-ballista →
apache/datafusion-ballista - https://github.com/apache/arrow-ballista-python →
apache/datafusion-ballista-python - https://github.com/apache/arrow-datafusion-comet →
apache/datafusion-comet
Update docs
- [x] https://github.com/apache/arrow-datafusion/pull/10130
- [x] Change hosting url from https://arrow.apache.org/datafusion/ to https://datafusion.apache.org (with a redirect from https://arrow.apache.org/datafusion/)
- [x] Change link under subprojects from https://arrow.apache.org/ to point at new docsite (or maybe we should just remove the link all together)
- [x] https://github.com/apache/datafusion/issues/10131
Infrastructure
- [x] https://github.com/apache/datafusion/issues/10151
- [x] https://github.com/apache/datafusion/issues/10194
- [x] https://github.com/apache/arrow-datafusion/issues/10133
- [x] Create mailing lists ([email protected], [email protected], [email protected], [email protected])
- [x] (email trouble) https://issues.apache.org/jira/browse/INFRA-25727
- [x] (DNS trouble): https://issues.apache.org/jira/browse/INFRA-25731
- [x] Update ASF config https://github.com/apache/datafusion/blob/main/.asf.yaml
- [x] Update committers list: https://github.com/apache/datafusion/issues/10154
Process
- [x] https://github.com/apache/arrow-datafusion/issues/10134
- [x] Update communications section https://arrow.apache.org/datafusion/contributor-guide/communication.html
- [x] Create a publicly accessable list of committers / affiliations (ideally from the data in https://people.apache.org/phonebook.html?unix=datafusion, similarly to https://arrow.apache.org/committers/)
- [x] https://github.com/apache/datafusion/issues/10236
- [ ] https://github.com/apache/datafusion/issues/10281
Announcements
- [ ] https://github.com/apache/arrow-datafusion/issues/10135
We will also need to make some updates to the release scripts and associated documentation (link to reporter URL, mailing list name, and so on)
Note it is critical in my opinion that all existing links continue to work (as in the url
https://github.com/apache/arrow-datafusion/issues/9691will be redirected to the new repo). I believe this is the default behavior in Github when a repository is renamed but we should double check.
Yes, I just verified this behavior is the default. I created a repo at https://github.com/phillipleblanc/datafusion-rename-test - created an issue: https://github.com/phillipleblanc/datafusion-rename-test/issues/1 and renamed the repo to datafusion-rename-success - and now the above links redirect to the new repo name.
Thank you for double checking @phillipleblanc
Its official -- DataFusion is now its own Top Level Project!
https://projects.apache.org/project.html?datafusion
I plan to file individual tickets for the tasks listed above later this morning
My plan / hope is that we leave as much of the operations of DataFusion the same day to day as we transition to our own top level project. Once we have gotten ourselves running independently then we can discuss any potential changes
I think mailing lists are the first important thing to setup, so following https://github.com/apache/arrow/blob/main/.asf.yaml, I submitted requests via https://selfserve.apache.org for the following emails
In continuing with the current norms of the DataFusion community, I did not create a [email protected] mailing list, because according to the docs "The vast majority of communication occurs in the open on our github repository in the form of tickets, issues, discussions, and Pull Requests."
Thus I think having another potential list that people might accidentally send a request would be confusing.
🎉🎉
I created https://github.com/apache/arrow-datafusion/pull/10130 to make some documentation updates to change the name from Apache Arrow DataFusion to Apache DataFusion
Regarding mailing lists, I think this may not be possible or hard to do, but just curious about if there are similar cases before. Is it possible to move previous threads related to DataFusion to new mailing lists?
I filed an INFRA ticket to rename the repos.
https://issues.apache.org/jira/browse/INFRA-25726
Regarding mailing lists, I think this may not be possible or hard to do, but just curious about if there are similar cases before. Is it possible to move previous threads related to DataFusion to new mailing lists?
I am not sure -- maybe we could look in the INFRA JIRA tickets or file a request
I created https://github.com/apache/arrow-datafusion/issues/10133 and https://github.com/apache/arrow-datafusion/issues/10134 for logistics
I also filed https://github.com/apache/arrow-datafusion/issues/10135 to make a blog post -- I would be interested in anyone has opinions on the venue (should we post to the arrow blog or make the first post on a new datafusion specific blog)?
I got an email aknowledging the creation of the email lists, but the domain name does not appear to be working. I filed https://issues.apache.org/jira/browse/INFRA-25727 to track
@comphead pointed out to me that the commiters list of the new repo is not updated yet. I will fix that now https://github.com/apache/datafusion/issues/10154
I also filed https://github.com/apache/datafusion/issues/10155, https://github.com/apache/datafusion/issues/10156, and https://github.com/apache/datafusion/issues/10157 to track the board reports (and build some institutional knowledge)
We are sill having DNS trouble -- filed https://issues.apache.org/jira/browse/INFRA-25731
Ok, the DNS issue has been resolved.
We have the website up (needs some links fixed) https://datafusion.apache.org/
The mailing lists are working as well. For example: https://lists.apache.org/[email protected]
Update .asf.yaml to point to to the new mailing list: https://github.com/apache/datafusion/pull/10189
I'm also taking a look at https://github.com/apache/datafusion/issues/10151
https://datafusion.apache.org/ now has the website live 🚀 -- thanks @phillipleblanc
this link currently still works and doesn't redirect -- will it redirect to the new one at some point? https://arrow.apache.org/datafusion/
(I know this is very new and in flight, appreciate the work here!)
We should rename slack and discord channels?
this link currently still works and doesn't redirect -- will it redirect to the new one at some point? https://arrow.apache.org/datafusion/
@lostmygithubaccount (😆 ) Yes absolutely -- here is a PR to do that https://github.com/apache/arrow-site/pull/502
We should rename slack and discord channels?
Update here is that @andygrove did so
Thanks to @kou we have completed https://github.com/apache/datafusion/issues/10194 and the old doc links now redirect to datafusion.apache.org now
I tested a few links like https://arrow.apache.org/datafusion/library-user-guide/working-with-exprs.html https://arrow.apache.org/datafusion/user-guide/cli/index.html
Update: @tisonkun has made a DOAP file 😄 -- #10233
I have created a proposed page with governance information: https://github.com/apache/datafusion/pull/10238
I filed a few more doc tweaks https://github.com/apache/datafusion/pull/10284 and https://github.com/apache/datafusion/pull/10285
I think all that is left for this epic is to write a blog post (https://github.com/apache/datafusion/issues/10135) and we can close it down
Actually, we also owe the ASF board a report each month for the first 3 months. I'll begin coordinating the first one shortly (tracked via https://github.com/apache/datafusion/issues/10281)
I have created a draft blog post on the arrow site for announcing the new top level project: https://github.com/apache/arrow-site/pull/512
DataFusion Top Level Project announcement is live: https://arrow.apache.org/blog/2024/05/07/datafusion-tlp/
Also, we got a suggestion to make an official ASF press releas https://github.com/apache/datafusion/issues/10403