ibis feat: support pyarrow UDFs for pyspark backend

feat: support pyarrow UDFs for pyspark backend

Open jstammers opened this issue 10 months ago • 0 comments

Is your feature request related to a problem?

Pyspark now supports Arrow UDFs that facilitate efficient row-by-row executions using Arrow as a backend e.g.

import pandas as pd
from pyspark.sql.functions import udf

@udf(returnType="int",useArrow=True)
def add_one(x:int) -> int:
    return x + 1

#Create column using pyarrow-udf
df = pd.DataFrame({"a":[1,2,3]})
dfs = spark.createDataFrame(df)
dfs.withColumn("b", add_one("a")).show()

However, the equivalent function using ibis raises a NotImplementedError, because only Pandas-based vectorized UDFs are supported

import ibis
from ibis import _

@ibis.udf.scalar.pyarrow
def add_one_pyarrow(x:int) -> int:
    return x + 1

@ibis.udf.scalar.pandas
def add_one_pandas(x:int) -> int:
    return x + 1

con = ibis.pyspark.connect(spark)
con.create_table("df", df, format="delta", overwrite=True)

table = con.table("df")
table.mutate(b=add_one_pandas(_.a)).execute()
table.mutate(b=add_one_pyarrow(_.a)).execute() #raises NotImpletmentedError

What is the motivation behind your request?

Pandas-based UDFs are not supported for the DuckDB backend, but Arrow-based ones are. For my use case, I would like to ensure parity between using either backend as much as possible, so being able to use Arrow-based UDFs on a pyspark table would be very useful

Describe the solution you'd like

I'd like a solution that would allow me to use an Arrow-based UDF on a pyspark table

What version of ibis are you running?

9.0.0.dev686

What backend(s) are you using, if any?

Pyspark

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Apr 29 '24 08:04 jstammers

ibis ibis copied to clipboard

feat: support pyarrow UDFs for pyspark backend

Is your feature request related to a problem?

What is the motivation behind your request?

Describe the solution you'd like

What version of ibis are you running?

What backend(s) are you using, if any?

Code of Conduct

ibis
ibis copied to clipboard