pyspark-style-guide icon indicating copy to clipboard operation
pyspark-style-guide copied to clipboard

Conventions for PySpark dataframe typehints

Open harrietrs opened this issue 2 years ago • 1 comments

When defining a function, it would be useful to follow a convention for PySpark DataFrame typehints, e.g.

from pyspark.sql import DataFrame
import pyspark.pandas as ps

def my_function(my_dataframe: DataFrame) -> ps.DataFrame:
    return my_dataframe.toPandas()

However the above doesn't clearly distinguish between the different data types. Perhaps an alias for the pyspark.sql.DataFrame is required- although I'm not sure of how to make it different from ps.DataFrame (an established alias).

harrietrs avatar Oct 25 '22 13:10 harrietrs