Hyukjin Kwon
Hyukjin Kwon
Oh, Koalas supports either: ```python ks.DataFrame(spark_df) ks.DataFrame(pandas_df) ks.DataFrame(koalas_series) ``` or ```python ks.DataFrame(...) # same as pandas ``` Mixed arguments are currently not supported. You can do, for example, as below:...
cc @ueshin @xinrong-databricks FYI
Would you mind filing a JIRA in https://issues.apache.org/jira/projects/SPARK?
The previous implementation was not really correct. So this is disabled for now. To mimic the previous behaviour, you can manually localize it by converting UTC to your local timezone...
Hm, which Spark version do you use? it works fine in my local: ```python >>> df.groupby("a").apply(toApply) a b a 1 0 1.0 3.0 ```
Ah, okay. I think you might need to specify the return type in `toApply`: ```python def toApply(df): if df['a'].iloc[0] > 1: # Imagine a sanity check here return df[:0] #...
Yeah, when you use the type hints, the index is lost, which is the limitation there currently.
Thanks for reporting this. Koalas has been migrated to Apache Spark. Would you mind reporting the issue to https://issues.apache.org/jira/projects/SPARK/issues please?
I think this issue basically due to the lack of calender support in PySpark (SPARK-24695). We might have to work around with long type for now.
It doesn't make much sense to support buffer which lives in single machine because Koalas targets to scale out the dataset. I think we should explicitly don't support it.