koalas
koalas copied to clipboard
DataFrame not working when using columns=[]
Hey guys,
I use to work with Pandas and recently I'm exploring Koalas with Databricks to leverage bigdata solutions. I'd like to share an issue I found when trying to update a pyspark dataframe using Koalas.
Look what happened when I loaded data previously from file to a pyspark df and when I used a sample dict instead for troubleshooting purposes:
import databricks.koalas as ks
kdf = ks.DataFrame(data=df, columns=['a', 'b'])
display(kdf)
It throws me an error:
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<command-3247900034274940> in <module>
---> 11 kdf = ks.DataFrame(df,columns=['a', 'b'])
13 display(kdf)
/databricks/python/lib/python3.7/site-packages/databricks/koalas/usage_logging/__init__.py in wrapper(*args, **kwargs)
178 start = time.perf_counter()
179 try:
--> 180 res = func(*args, **kwargs)
181 logger.log_success(
182 class_name, function_name, time.perf_counter() - start, signature
/databricks/python/lib/python3.7/site-packages/databricks/koalas/frame.py in __init__(self, data, index, columns, dtype, copy)
469 elif isinstance(data, spark.DataFrame):
470 assert index is None
--> 471 assert columns is None
472 assert dtype is None
473 assert not copy
I did some tests and it works when I use a dict instead of a df as above:
import databricks.koalas as ks
df = {'a': [1, 2, 3, 4, 5, 6],
'b': [100, 200, 300, 400, 500, 600],
'c': ["one", "two", "three", "four", "five", "six"]}
kdf = ks.DataFrame(df, columns=['a','b'])
display(kdf)
I also tried with Pandas and it worked fine when I used the same pyspark df. It follows the code below:
import pandas as pd
pdf = df.toPandas()
pdf = pd.DataFrame(df, columns=['a','b'])
display(pdf)
I tried to use filter method as below and worked but it's not the same thing, right?
import databricks.koalas as ks
kdf = ks.DataFrame(df)
kdf_filtered = kdf.filter(items=['a', 'b'])
display(kdf_filtered)
Does anybody know if I'm doing sth wrong? Is this sth under construction that needs to be fixed in koalas?
Thanks!
Oh, Koalas supports either:
ks.DataFrame(spark_df)
ks.DataFrame(pandas_df)
ks.DataFrame(koalas_series)
or
ks.DataFrame(...) # same as pandas
Mixed arguments are currently not supported.
You can do, for example, as below:
>>> kdf = ks.DataFrame(df)
>>> kdf.columns=["foo"]