Hyukjin Kwon comments

Results 207 comments of


                                            Hyukjin Kwon

DataFrame not working when using columns=[]

Oh, Koalas supports either: ```python ks.DataFrame(spark_df) ks.DataFrame(pandas_df) ks.DataFrame(koalas_series) ``` or ```python ks.DataFrame(...) # same as pandas ``` Mixed arguments are currently not supported. You can do, for example, as below:...

Popping item from categorical series returns index instead of value

cc @ueshin @xinrong-databricks FYI

Popping item from categorical series returns index instead of value

Would you mind filing a JIRA in https://issues.apache.org/jira/projects/SPARK?

Timezone-aware datetimes are no longer supported

The previous implementation was not really correct. So this is disabled for now. To mimic the previous behaviour, you can manually localize it by converting UTC to your local timezone...

Cannot return empty dataframe in apply?

Hm, which Spark version do you use? it works fine in my local: ```python >>> df.groupby("a").apply(toApply) a b a 1 0 1.0 3.0 ```

Cannot return empty dataframe in apply?

Ah, okay. I think you might need to specify the return type in `toApply`: ```python def toApply(df): if df['a'].iloc[0] > 1: # Imagine a sanity check here return df[:0] #...

Cannot return empty dataframe in apply?

Yeah, when you use the type hints, the index is lost, which is the limitation there currently.

fillna does not work with decimals

Thanks for reporting this. Koalas has been migrated to Apache Spark. Would you mind reporting the issue to https://issues.apache.org/jira/projects/SPARK/issues please?

groupby.diff with datetime column

I think this issue basically due to the lack of calender support in PySpark (SPARK-24695). We might have to work around with long type for now.

Koalas.read_*() methods replace path(type:str)

It doesn't make much sense to support buffer which lives in single machine because Koalas targets to scale out the dataset. I think we should explicitly don't support it.