datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

Support more builtin functions provided by spark and missing in datafusion ?

Open psvri opened this issue 1 year ago • 4 comments

Is your feature request related to a problem or challenge?

Comparing functions offered by spark and datafusion, I see that log1p and sha1 is not supported. There maybe many more such functions which isnt present in datafusion but present in spark.

Describe the solution you'd like

if possible I would like them to be offered by datafusion natively.

Describe alternatives you've considered

Of course they can be implemented as udf's if not supported out of the box.

Additional context

After seeing https://github.com/apache/arrow-datafusion-comet, I wanted to compare what functions we are missing from spark and found log1p and sha1 for the moment .

I wanted to use this issue as a whole to ask if its okay to implement functions present in spark and not in datafusion.

I am willing to implement the missing functions.

psvri avatar Feb 16 '24 17:02 psvri

Hmm, for Spark specified functions (i.e., other engines don't have them mostly), I'd more like to have them in Comet instead of adding into DataFusion. Otherwise, we will have many more functions soon as people will keep adding functions from other engines (postgresql, etc..).

For example, log1p(expr) - Returns log(1 + expr) is not in postgresql as I searched. Also, it seems can be replaced with log simply.

viirya avatar Feb 16 '24 18:02 viirya

Yes, I know it can be easily replaced. But the issue I created is to get a consensus on what is to be supported.

Going over the differences some more, here is a list of functions common across postgres and spark not present in datafusion. Should we add them here or keep datafusion minimal ?

psvri avatar Feb 16 '24 18:02 psvri

My opinion is to have necessary functions which are implemented by most engines. For engine-specified functions, maybe we can have feature to guard them (or even, separate crate), if we are going to have them in DataFusion.

It is nicer to heard other opinions.

viirya avatar Feb 16 '24 18:02 viirya

Sure, lets wait. Thanks for the info.

psvri avatar Feb 16 '24 18:02 psvri

Update here is that one of the reasons for doing #9285 (pulling functions out of the core) is to make it easier to choose what function semantics users of this crate want. We are getting close

alamb avatar Mar 24 '24 18:03 alamb