Return empty column name when column schema cannot be infered
This is not an issue, but would be nice to have column name listed in exception text when some column schema cannot be inferred because of all column values are nulls. Sample code:
import databricks.koalas as ks
import pandas as pd
df = spark_session.createDataFrame([
('1', None,),
], 'a string, b string')
kdf = df.to_koalas()
def f(pdf: pd.DataFrame):
return pdf
print(kdf.koalas.apply_batch(f))
There is no type hint for f func and all values for column b are nulls, thus schema cannot be inferred.
This code throws exception which is a bit confusing for new users: ValueError: can not infer schema from empty or null dataset
It would be much more user-friendly to throw something like ValueError: can not infer schema from column 'b' cause all row values are nulls
Hi @vkrot-exos, thanks for the suggestion! It sounds a good idea. Would you mind submitting the PR to modify the error message? Thanks!