koalas icon indicating copy to clipboard operation
koalas copied to clipboard

Series.to_json(orient='records') does not return records-based JSON

Open klenium opened this issue 3 years ago • 3 comments

df = ks.DataFrame([['a', 'b'], ['c', 'd']], columns=['col 1', 'col 2'])

def add_json(row):
  row['serialized_row_content'] = row.to_json()
  return row

df = df.apply(add_json, axis = 1)

print(df)

  col 1 col 2     serialized_row_content
0     a     b  {"col 1":"a","col 2":"b"}
1     c     d  {"col 1":"c","col 2":"d"}

That works as expected. The documentation says:

orient str, default ‘records’ It should be always ‘records’ for now.

So if instead of row.to_json() I write row.to_json(orient = 'records'), the output must be the same. But it's not:

  col 1 col 2 serialized_row_content
0     a     b              ["a","b"]
1     c     d              ["c","d"]

Which is rather the values format from Pandas.

klenium avatar Dec 22 '21 10:12 klenium

Very interesting, I don't see the reason for this behavior in its source code. :)

klenium avatar Dec 22 '21 10:12 klenium

row['type'] = str(type(row)) -> <class 'pandas.core.series.Series'> Well that's unexpected, why is a Pandas Series used there? Also why wouldn't it return records-based JSON uh.

klenium avatar Dec 22 '21 10:12 klenium

The same applies to Pandas on Spark. If I follow the documentation and call to_json('records'), then the output is None thus I get errors.

klenium avatar Apr 10 '22 16:04 klenium