datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

[EPIC] Tasks for a new Top Level Apache Project

Open alamb opened this issue 1 year ago • 27 comments

Is your feature request related to a problem or challenge?

Update: the new project was approved by the board and we are now in the process of setting it all up

We are on track to have a new top level apache project for DataFusion https://github.com/apache/arrow-datafusion/discussions/6475 (see Proposal Document for more details)

Once this is approved by the ASF board, we will have some logistical things to change. This ticket tracks the work items

Describe the solution you'd like

Rename github repositories

  • [x] https://issues.apache.org/jira/browse/INFRA-25726

Note it is critical in my opinion that all existing links continue to work (as in the url https://github.com/apache/arrow-datafusion/issues/9691 will be redirected to the new repo). I believe this is the default behavior in Github when a repository is renamed but we should double check.

  • https://github.com/apache/arrow-datafusion/ → apache/datafusion
  • https://github.com/apache/arrow-datafusion-python → apache/datafusion-python
  • https://github.com/apache/arrow-ballista → apache/datafusion-ballista
  • https://github.com/apache/arrow-ballista-python → apache/datafusion-ballista-python
  • https://github.com/apache/arrow-datafusion-comet → apache/datafusion-comet

Update docs

  • [x] https://github.com/apache/arrow-datafusion/pull/10130
  • [x] Change hosting url from https://arrow.apache.org/datafusion/ to https://datafusion.apache.org (with a redirect from https://arrow.apache.org/datafusion/)
  • [x] Change link under subprojects from https://arrow.apache.org/ to point at new docsite (or maybe we should just remove the link all together)
  • [x] https://github.com/apache/datafusion/issues/10131

Infrastructure

  • [x] https://github.com/apache/datafusion/issues/10151
  • [x] https://github.com/apache/datafusion/issues/10194
  • [x] https://github.com/apache/arrow-datafusion/issues/10133
  • [x] Create mailing lists ([email protected], [email protected], [email protected], [email protected])
  • [x] (email trouble) https://issues.apache.org/jira/browse/INFRA-25727
  • [x] (DNS trouble): https://issues.apache.org/jira/browse/INFRA-25731
  • [x] Update ASF config https://github.com/apache/datafusion/blob/main/.asf.yaml
  • [x] Update committers list: https://github.com/apache/datafusion/issues/10154

Process

  • [x] https://github.com/apache/arrow-datafusion/issues/10134
  • [x] Update communications section https://arrow.apache.org/datafusion/contributor-guide/communication.html
  • [x] Create a publicly accessable list of committers / affiliations (ideally from the data in https://people.apache.org/phonebook.html?unix=datafusion, similarly to https://arrow.apache.org/committers/)
  • [x] https://github.com/apache/datafusion/issues/10236
  • [ ] https://github.com/apache/datafusion/issues/10281

Announcements

  • [ ] https://github.com/apache/arrow-datafusion/issues/10135

alamb avatar Mar 19 '24 11:03 alamb

We will also need to make some updates to the release scripts and associated documentation (link to reporter URL, mailing list name, and so on)

andygrove avatar Mar 19 '24 13:03 andygrove

Note it is critical in my opinion that all existing links continue to work (as in the url https://github.com/apache/arrow-datafusion/issues/9691 will be redirected to the new repo). I believe this is the default behavior in Github when a repository is renamed but we should double check.

Yes, I just verified this behavior is the default. I created a repo at https://github.com/phillipleblanc/datafusion-rename-test - created an issue: https://github.com/phillipleblanc/datafusion-rename-test/issues/1 and renamed the repo to datafusion-rename-success - and now the above links redirect to the new repo name.

phillipleblanc avatar Apr 04 '24 05:04 phillipleblanc

Thank you for double checking @phillipleblanc

alamb avatar Apr 04 '24 15:04 alamb

Its official -- DataFusion is now its own Top Level Project!

https://projects.apache.org/project.html?datafusion

I plan to file individual tickets for the tasks listed above later this morning

alamb avatar Apr 18 '24 11:04 alamb

My plan / hope is that we leave as much of the operations of DataFusion the same day to day as we transition to our own top level project. Once we have gotten ourselves running independently then we can discuss any potential changes

I think mailing lists are the first important thing to setup, so following https://github.com/apache/arrow/blob/main/.asf.yaml, I submitted requests via https://selfserve.apache.org for the following emails

In continuing with the current norms of the DataFusion community, I did not create a [email protected] mailing list, because according to the docs "The vast majority of communication occurs in the open on our github repository in the form of tickets, issues, discussions, and Pull Requests."

Thus I think having another potential list that people might accidentally send a request would be confusing.

alamb avatar Apr 18 '24 12:04 alamb

🎉🎉

metesynnada avatar Apr 18 '24 14:04 metesynnada

I created https://github.com/apache/arrow-datafusion/pull/10130 to make some documentation updates to change the name from Apache Arrow DataFusion to Apache DataFusion

andygrove avatar Apr 18 '24 15:04 andygrove

Regarding mailing lists, I think this may not be possible or hard to do, but just curious about if there are similar cases before. Is it possible to move previous threads related to DataFusion to new mailing lists?

viirya avatar Apr 18 '24 16:04 viirya

I filed an INFRA ticket to rename the repos.

https://issues.apache.org/jira/browse/INFRA-25726

andygrove avatar Apr 18 '24 19:04 andygrove

Regarding mailing lists, I think this may not be possible or hard to do, but just curious about if there are similar cases before. Is it possible to move previous threads related to DataFusion to new mailing lists?

I am not sure -- maybe we could look in the INFRA JIRA tickets or file a request

alamb avatar Apr 18 '24 20:04 alamb

I created https://github.com/apache/arrow-datafusion/issues/10133 and https://github.com/apache/arrow-datafusion/issues/10134 for logistics

I also filed https://github.com/apache/arrow-datafusion/issues/10135 to make a blog post -- I would be interested in anyone has opinions on the venue (should we post to the arrow blog or make the first post on a new datafusion specific blog)?

alamb avatar Apr 18 '24 20:04 alamb

I got an email aknowledging the creation of the email lists, but the domain name does not appear to be working. I filed https://issues.apache.org/jira/browse/INFRA-25727 to track

alamb avatar Apr 19 '24 10:04 alamb

@comphead pointed out to me that the commiters list of the new repo is not updated yet. I will fix that now https://github.com/apache/datafusion/issues/10154

alamb avatar Apr 20 '24 19:04 alamb

I also filed https://github.com/apache/datafusion/issues/10155, https://github.com/apache/datafusion/issues/10156, and https://github.com/apache/datafusion/issues/10157 to track the board reports (and build some institutional knowledge)

alamb avatar Apr 20 '24 20:04 alamb

We are sill having DNS trouble -- filed https://issues.apache.org/jira/browse/INFRA-25731

alamb avatar Apr 21 '24 10:04 alamb

Ok, the DNS issue has been resolved.

We have the website up (needs some links fixed) https://datafusion.apache.org/

The mailing lists are working as well. For example: https://lists.apache.org/[email protected]

alamb avatar Apr 22 '24 16:04 alamb

Update .asf.yaml to point to to the new mailing list: https://github.com/apache/datafusion/pull/10189

I'm also taking a look at https://github.com/apache/datafusion/issues/10151

phillipleblanc avatar Apr 23 '24 02:04 phillipleblanc

https://datafusion.apache.org/ now has the website live 🚀 -- thanks @phillipleblanc

alamb avatar Apr 23 '24 13:04 alamb

this link currently still works and doesn't redirect -- will it redirect to the new one at some point? https://arrow.apache.org/datafusion/

(I know this is very new and in flight, appreciate the work here!)

lostmygithubaccount avatar Apr 23 '24 13:04 lostmygithubaccount

We should rename slack and discord channels?

comphead avatar Apr 23 '24 15:04 comphead

this link currently still works and doesn't redirect -- will it redirect to the new one at some point? https://arrow.apache.org/datafusion/

@lostmygithubaccount (😆 ) Yes absolutely -- here is a PR to do that https://github.com/apache/arrow-site/pull/502

alamb avatar Apr 23 '24 15:04 alamb

We should rename slack and discord channels?

Update here is that @andygrove did so

alamb avatar Apr 23 '24 21:04 alamb

Thanks to @kou we have completed https://github.com/apache/datafusion/issues/10194 and the old doc links now redirect to datafusion.apache.org now

I tested a few links like https://arrow.apache.org/datafusion/library-user-guide/working-with-exprs.html https://arrow.apache.org/datafusion/user-guide/cli/index.html

alamb avatar Apr 25 '24 00:04 alamb

Update: @tisonkun has made a DOAP file 😄 -- #10233

I have created a proposed page with governance information: https://github.com/apache/datafusion/pull/10238

alamb avatar Apr 25 '24 22:04 alamb

I filed a few more doc tweaks https://github.com/apache/datafusion/pull/10284 and https://github.com/apache/datafusion/pull/10285

I think all that is left for this epic is to write a blog post (https://github.com/apache/datafusion/issues/10135) and we can close it down

alamb avatar Apr 29 '24 13:04 alamb

Actually, we also owe the ASF board a report each month for the first 3 months. I'll begin coordinating the first one shortly (tracked via https://github.com/apache/datafusion/issues/10281)

alamb avatar Apr 29 '24 13:04 alamb

I have created a draft blog post on the arrow site for announcing the new top level project: https://github.com/apache/arrow-site/pull/512

alamb avatar May 01 '24 10:05 alamb

DataFusion Top Level Project announcement is live: https://arrow.apache.org/blog/2024/05/07/datafusion-tlp/

alamb avatar May 07 '24 10:05 alamb

Also, we got a suggestion to make an official ASF press releas https://github.com/apache/datafusion/issues/10403

alamb avatar May 07 '24 10:05 alamb