Figure out how to deal with the PySpark 2 extensions
The DataFrame#transform extension is useful for PySpark 2 users but should not be run for PySpark 3 users (cause it's built into the API).
When a user runs from quinn.extensions import * we can either use the spark.version variable to programatically skip over modules that shouldn't be imported for Spark 3 or we can design a separate import interface.
I'm still not sure which approach is better.
I am going to switch the project to Python 3 and remove DataFrame#transform.
We can replace DataFrame.transform = transform by something like this:
DataFrame.transform = getattr(DataFrame, "transform", transform)
and it should work in both 2d and 3d versions. I can open a PR with this.
P.S. I can do it for all extensions to avoid such a problems or any unexpected behavior in the future.
@SemyonSinchenko - would there be any way for PySpark 2 to be able to import this function, but for the function to error out if a user is using PySpark 3 or greater and tried to import this function? I'd prefer for PySpark 3 users to leverage the built-in function. Sidenote: they updated this particular function in PySpark 3.3, so the 3.3 method signature is different than the 3.1 method signature 🙃
would there be any way for PySpark 2 to be able to import this function, but for the function to error out if a user is using PySpark 3 or greater and tried to import this function?
Thats exactly what my snipped of code will do. If there is an attribute transform in DataFrame it will leave it as is but is there is no such an attribute it will add it. So behavior will depends of version.
@SemyonSinchenko - your suggested solution sounds ideal in that case. Can you please send a PR?
@SemyonSinchenko - your suggested solution sounds ideal in that case. Can you please send a PR?
I'll do it.
Work was done in #81