pyspark-style-guide
pyspark-style-guide copied to clipboard
Conventions for PySpark dataframe typehints
When defining a function, it would be useful to follow a convention for PySpark DataFrame typehints, e.g.
from pyspark.sql import DataFrame
import pyspark.pandas as ps
def my_function(my_dataframe: DataFrame) -> ps.DataFrame:
return my_dataframe.toPandas()
However the above doesn't clearly distinguish between the different data types. Perhaps an alias for the pyspark.sql.DataFrame is required- although I'm not sure of how to make it different from ps.DataFrame (an established alias).