datafusion
datafusion copied to clipboard
Oct 16, 2024: This week in DataFusion
Introduction
Goal of this ticket is a weekly summary if interesting things happening in DataFusion over the last week. Note this is not a complete list. Please feel free to comment on this ticket about things that I may have missed or you think should get wider attention by the community
Loosely inspired by https://this-week-in-rust.org/
Andrew's TLDR:
We are preparing for the 43.0.0 release and I am personally pretty excited about:
- https://github.com/apache/datafusion/issues/12821
- https://github.com/apache/datafusion/issues/8709
- https://github.com/apache/datafusion/issues/12740
Upcoming Releases
- https://github.com/apache/datafusion/issues/12813 (thanks @Xuanwo and @matthewmturner)
- https://github.com/apache/datafusion/issues/12470 (thanks @andygrove)
Project Happenings
- Integrate sqlparser into DataFusion governance: https://github.com/apache/datafusion-sqlparser-rs/issues/1294#issuecomment-2377918831
Highlights from last week(s):
(I am sorry if I missed you -- please add a note to this ticket with anything you would like to add)
- @dmitrybugakov started managing extension functions in https://github.com/datafusion-contrib/datafusion-functions-extra
- @eejbyfeldt is doing some great work on grouping sets such as https://github.com/apache/datafusion/pull/12704
- @tokoko, @Blizzara @vbarua and @westonpace continue to mature the substrait support such as https://github.com/apache/datafusion/pull/12800
- Along with @devanbenz and @Rachelint and @jayzhan211 I implemented https://github.com/apache/datafusion/pull/12792 to help clickbench queries
- @timesaucer made a beautiful macro https://github.com/apache/datafusion/pull/12846
- @Rachelint made a beautiful aggregation fuzzing proect
- @jonahgao continues to make our SQL handling more beautiful and correct (https://github.com/apache/datafusion/pull/12808, https://github.com/apache/datafusion/pull/12844, etc)
Performance
- https://github.com/apache/datafusion/issues/12821 (thanks to the epic work of @Rachelint, @goldmedal, @jayzhan211, @Dandandan @XiangpengHao and others, we are quite close)
- https://github.com/apache/datafusion/issues/12680 (kudos to @jayzhan211 and @Rachelint)
- @simonvandel and @tlmn https://github.com/apache/datafusion/pull/12890
Quality
- https://github.com/apache/datafusion/issues/12114 (already found several bugs -- thanks @Rachelint)
Extensibility
- Very close to finishing https://github.com/apache/datafusion/issues/8709 (thanks @jcsherin @jatin510 @hailelagi)
- @Omega359 started https://github.com/apache/datafusion/issues/12740 and we are making great progress thanks to @jonathanc-n @juroberttyb and others
- @notfillipo and @findepi are working to better separate logical and physical types https://github.com/apache/datafusion/issues/12622
Features
Interesting discussions underway:
- https://github.com/apache/datafusion/issues/11442
- https://github.com/apache/datafusion/issues/12357
Community
- Weekly Call
- Slack/Discord: info links
Upcoming meetups:
- Oct 14 Seattle: https://lu.ma/tnwl866b @phillipleblanc @likekim
- Dec 18 Chicago: https://lu.ma/eq5myc5i @adriangb @timsaucer
Background:
I got some great feedback from @timsaucer, @findepi and @andygrove on the DataFusion weekly call that having a weekly summary like https://github.com/apache/datafusion/issues/12494 was helpful. I will therefore try to write up one each week
@alamb I really like this, keeping one up each week would be great. Gives everybody a good direction to go in for the overall project. Thanks for writing this!
A discussion about meetup in Amsterdam:
- https://github.com/apache/datafusion/discussions/12988
Something else I hope to highlight next week is how the process of reviewing PRs helps understand the code, helps the community, and drives the process forward
Next week's issue: https://github.com/apache/datafusion/issues/13035