May 2024 ASF Board Report
Is your feature request related to a problem or challenge?
Per https://www.apache.org/foundation/board/reporting, for the first three months of a project it should submit monthly board reports to the ASF board
Subsequently, per https://whimsy.apache.org/roster/committee/datafusion the DataFusion ASF board report schedule is
March, June, September, December
Describe the solution you'd like
I would like to draft a board report for the ASF board meeting, ideally with community help.
The meetings are typically in the second or third week of the month
Describe alternatives you've considered
I plan to do this in the same style that worked well in Arrow (see an example from @andygrove here https://lists.apache.org/thread/7w4mgy98qomc6drvj2fo81gvhq6p0boc) -- make a google doc (or issue) that people can add relevant content to and then the chair (me for the time being) submits it to the board
Additional context
No response
I would like to help on this.
Here is a draft board report that would be great if people could help fill in in: https://docs.google.com/document/d/1knyR2epIOY7WoXZO_DOtlcPNSenb3-V-osCHqPXqSms/edit
Here is a draft board report that would be great if people could help fill in in: https://docs.google.com/document/d/1knyR2epIOY7WoXZO_DOtlcPNSenb3-V-osCHqPXqSms/edit
I took a stab at filling out what I could - I think some of the numbers related to the repo stats will need to be re-run closer to the board meeting. Here are the links/git commands I used to get the numbers. I chose 2024-04-16 because its the date when DataFusion became a top-level project.
101 commits since 2024-04-16
git log --since="2024-04-16" --pretty=format:"%h" | wc -l
39 code contributors since 2024-04-16
git shortlog -sn --since="2024-04-16" | wc -l
126 PRs opened on GitHub since 2024-04-16 https://github.com/apache/datafusion/pulls?q=is%3Apr+created%3A%3E%3D2024-04-16
140 PRs closed on GitHub since 2024-04-16 https://github.com/apache/datafusion/pulls?q=is%3Apr+closed%3A%3E%3D2024-04-16
104 issues opened on GitHub since 2024-04-16 https://github.com/apache/datafusion/issues?q=is%3Aissue+created%3A%3E%3D2024-04-16
76 issues closed on GitHub since 2024-04-16 https://github.com/apache/datafusion/issues?q=is%3Aissue+closed%3A%3E%3D2024-04-16
Starting from the June board meeting, we can add the percentage increases (although it will be a little unfair since this is a partial month)
Thank you @phillipleblanc -- this is great 🙏
I think this needs to be submitted by may 8 (not may 15) so I'll do so this week
Apparently it needed to be done today
Here is the text of the report that was submitted:
## Description:
The mission of Apache DataFusion is the creation and maintenance of software
related to an extensible query engine
## Project Status:
Current project status: New + Ongoing (high activity)
Issues for the board: None
## Membership Data:
Apache DataFusion was founded 2024-04-16 (20 days ago)
There are currently 29 committers and 9 PMC members in this project.
The Committer-to-PMC ratio is roughly 8:3.
Community changes, past quarter:
- No new PMC members (project graduated recently).
- Mustafa Akur was added as committer on 2024-04-20
- Brent Gardner was added as committer on 2024-04-20
- Oleks V. was added as committer on 2024-04-20
- Jay Zhan was added as committer on 2024-04-20
- Jeffrey Vo was added as committer on 2024-04-20
- Liu Jiayu was added as committer on 2024-04-20
- Metehan Yildirim was added as committer on 2024-04-20
- Wang Mingming was added as committer on 2024-04-20
- Marco Neumann was added as committer on 2024-04-20
- Zhong Yanghong was added as committer on 2024-04-20
- Mehmet Ozan Kabak was added as committer on 2024-04-20
- Paddy Horan was added as committer on 2024-04-20
- Rémi Dettai was added as committer on 2024-04-20
- Sun Chao was added as committer on 2024-04-20
- Daniel Harris was added as committer on 2024-04-20
- Raphael Taylor-Davies was added as committer on 2024-04-20
- Ruihang Xia was added as committer on 2024-04-20
- Xudong Wang was added as committer on 2024-04-20
- Yang Jiang was added as committer on 2024-04-20
- Yijie Shen was added as committer on 2024-04-20
## Project Activity:
The project is quite active with many PRs and issues opened and closed per
day. We have spent significant time on tasks related to becoming a new top
level project.
DataFusion became its own top level project after operating as a subproject of
Apache Arrow for several years.
We have been focused on the [tasks] required to operate as our own project,
largely logistical such as updating documentation, creating mailing lists, and
a [DOAP file].
[tasks]: https://github.com/apache/datafusion/issues/9691
[DOAP file]: https://projects.apache.org/project.html?datafusion
### DataFusion core
https://github.com/apache/datafusion
In addition to the work related to moving to a top-level project, the
community is focused on making logical planning faster, making function
packages (i.e. UDFs) modular and easier to mix/match, and “de-parsing” logical
plan expressions back to SQL.
We are preparing the first release as a new project, version 38.0.0
For the DataFusion repo since 2024-04-16, as of 2024-05-07:
132 commits[1] 46 code contributors[2] 168 PRs opened on GitHub[3] 187 PRs
closed on GitHub[4] 130 issues opened on GitHub[5] 94 issues closed on
GitHub[6]
[1]: git log --since="2024-04-16" --pretty=format:"%h" | wc -l
[2]: git shortlog -sn --since="2024-04-16" | wc -l
[3]: https://s.apache.org/x5gkj
[4]: https://s.apache.org/rg9op
[5]: https://s.apache.org/sqlun
[6]: https://s.apache.org/l3clf
### Sub project: DataFusion Python
https://github.com/apache/datafusion-python
The DataFusion Python subproject is not currently actively maintained and
there has been no release yet to upgrade to DataFusion version 37 or to
prepare for the upcoming DataFusion 38 release.
### Sub project: DataFusion Comet
https://github.com/apache/datafusion-comet
The Comet subproject is very active and is receiving significant contributions
from new contributors. There is some initial documentation published at
https://datafusion.apache.org/comet/.
### Sub project: DataFusion Ballista
https://github.com/apache/datafusion-ballista
https://github.com/apache/datafusion-ballista-python
The Ballista subproject is not currently actively maintained.
### Recent Releases
* 37.1.0 was released on 2024-04-22
* 37.0.0 was released on 2024-04-05
## Community Health:
Overall, the community seems excited by becoming a new top level
projectand contributions continue to arrive and activity on the
project continues. We have not made any significant change
in day to day operations, and don’t have any plans to do so
at the moment.
The PMC lists are now set up and we are actively discussing
growing committers and the PMC. We expect both of these groups
to grow in the near future.
In the last 6 months or so, it has been hard to discuss potential
committers within the Arrow PMC as many contributors focused
almost exclusively on DataFusion and did not also have substantial
contributions to Arrow (which was more common earlier in the
project's life).
We have also created a [Governance Page] to maintain project
transparency, largely based on the content from the Arrow project.
[Governance Page]: https://s.apache.org/98bwp