Implement native version of ColumnarToRow
What is the problem the feature request solves?
Spark's ColumnarToRowExec can be very slow in some cases. This plan shows that C2R took 8 minutes even though the underlying scan only took 20 seconds. This example is selecting decimal columns, so this is related to https://github.com/apache/datafusion-comet/issues/670 and https://github.com/apache/datafusion-comet/issues/679, but implementing a native version of ColumnarToRowExec may help with this.
Describe the potential solution
No response
Additional context
No response
I created a Google document to discuss the design.
https://docs.google.com/document/d/1zNuavf_WT3IcpeTVAEC8IjMGloi1MeAeQSg4F0eEivs/edit?usp=sharing
I created https://github.com/apache/datafusion-comet/issues/837 for the very first step in this process
Here is a like for like comparison between Spark and Comet for Scan+C2R for the optimized version of q72.
Spark
Spark C2R takes 1.1 minutes
Comet
Comet C2R takes 1.3 minutes
There is some spill happening, even though I am allocating 20g memory overhead:
@parthchandra I know that you are working on this so I tried assigning the issue to you, but your name does not show up for me for some reason. I wondered if pinging you here would change that.
edit: No, that did not help
Posting a reply in case it helps associate the issue somehow. Anyhow, confirming that I am indeed working on this. In that context, I am initially planning to only add support for fixed width primitive types. Then add decimals and varchar. Estimating the sizes of complex types is somewhat complicated, so I will defer that until we see actual gains from this effort.
I am closing this issue for now because I believe that we determined that this is no longer a priority. We can reopen the issue if this changes.