datafusion Blog post with DataFusion July

Is your feature request related to a problem or challenge?

We have had good luck writing up quarterly updates for DataFusion, most recently: https://datafusion.apache.org/blog/2024/07/24/datafusion-40.0.0/

See https://github.com/apache/datafusion/issues/9602

Describe the solution you'd like

Blog post

Describe alternatives you've considered

No response

Additional context

No response

Jul 24 '24 11:07 alamb

Here is my wishlist for things to write about in the next blog:

Charts showing the speedup from https://github.com/apache/datafusion/issues/10918 (from @XiangpengHao @Weijun-H @PsiACE and others)
Charts showing improvements realted to aggregation improvements (e.g. https://github.com/apache/datafusion/pull/11627 etc with @jayzhan211 and @korowa )
Something about Substrait (thanks to @Blizzara @dharanad and others) -- is there any big milestone we can claim?
Maybe something about better MAP type support that @goldmedal and others have been working on

Also, of course, I would love to have more help writing a blog (maybe someone else could draft it 🤔 🎣 )

Jul 24 '24 11:07 alamb

@alamb Thank you for considering me, but I think there may be some confusion - I wasn't involved in the work on Substrait. However, I'd be happy to contribute to a blog post on MAP once I've completed adding support for Arrays in #11436

Jul 24 '24 11:07 dharanad

@alamb Thank you for considering me, but I think there may be some confusion

Yes I was probably confused -- sorry about that

Jul 24 '24 21:07 alamb

@alamb for Substrait - maybe the work @Lordworms has been doing to add the TPC-H tests would be good at least? From my side, I don't know if there's any precise milestone as such - but maybe something around supporting VirtualTables, more literals and types, better interoperability with other substrait producers. (I do hope to write a separate blog post from our perspective if/when I've proven the whole setup I'm working on works and is faster, but we're not there yet unfortunately.)

Jul 30 '24 18:07 Blizzara

Blog with https://github.com/apache/datafusion/pull/11627 performance high cardinality aggs / partial skipping

Aug 05 '24 11:08 alamb

It would also be cool to discuss efforts for chunked emission https://github.com/apache/datafusion/pull/11943 for (more) aggregage performance

Aug 22 '24 13:08 alamb

My plan for this is that we will finish up enabling string view and then make that performance improvement be the headline for this post

Oct 15 '24 15:10 alamb

I think @Omega359 is going to handle this one in

https://github.com/apache/datafusion-site/pull/57

Feb 22 '25 11:02 alamb

Blog post with DataFusion July - Sep 2024

Is your feature request related to a problem or challenge?

Describe the solution you'd like

Describe alternatives you've considered

Additional context