koalas icon indicating copy to clipboard operation
koalas copied to clipboard

apply does not work properly with databricks-connect

Open kismsu opened this issue 4 years ago • 4 comments

Hi, when I run my code with pyCharm and databricks-connect on Spark 2.4.5 cluster I get

AttributeError: 'DataFrame' object has no attribute 'mapInPandas'

I believe this is due to the fact that in apply function

should_use_map_in_pandas = LooseVersion(pyspark.__version__) >= "3.0"

does not work as expected with databricks-connect, as pyspark.version returns the version of connect package, not Spark version

kismsu avatar Oct 06 '20 11:10 kismsu

I think we do no plan to support it in DBConnect in the near future.

cc @juliuszsompolski @youngbink

gatorsmile avatar Oct 06 '20 20:10 gatorsmile

Not sure I understand what do you mean. Are you saying DataFrame.apply should not work in DBConnect, and I should move my code to the notebook every time I have apply in my code?

kismsu avatar Oct 07 '20 05:10 kismsu

@kismsu we don't yet officially support Koalas with DB Connect (for any versions). It seems like this particular issue might be avoided with 7.1, but it could have other issues until we officially support it.

youngbink avatar Oct 07 '20 19:10 youngbink

Ok, I see. This is good to know. We're trying to get into 7.1 actually but have to sort out some infrastructure issues. Thanks.

kismsu avatar Oct 07 '20 20:10 kismsu