koalas
koalas copied to clipboard
fillna does not work with decimals
Description
The fillna
function does not support the decimal type. If you have a column of DecimalType, which gets converted to decimal.Decimal (and is shown as object type by koalas), the straightforward way of using fillna will give you an error TypeError: Unsupported type Decimal
Example code:
df = pd.DataFrame({"test": [1.23, 2.34, None, 4.56, 5.67]})
sdf = spark.createDataFrame(df).select(f.col("test").cast("decimal(4,2)"))
kdf = sdf.to_koalas()
m = kdf["test"].mean() # -> Decimal('3.450000')
kdf["test"].fillna(m) # -> TypeError
Workaround
Currently you can work around this by filling with a float or a double and then recast the column to the decimaltype. This is however cumbersome and you cannot reuse the same code for different types, because you need to explicitly check if you're dealing with decimal types to prevent the error.
kdf["test"] = kdf["test"].fillna(float(m)).astype(decimal.Decimal) # No TypeError
Suggested solution
It seems there are hardcoded type checks in the _fillna
implementation (here and here). Add the decimal.Decimal
type to that check to prevent the error. I'm not qualified to assess whether this solution is appropriate, hence this issue ticket :)
Thanks for reporting this. Koalas has been migrated to Apache Spark. Would you mind reporting the issue to https://issues.apache.org/jira/projects/SPARK/issues please?