tiled icon indicating copy to clipboard operation
tiled copied to clipboard

Accept Python dict from client "write_dataframe" and TableAdapter

Open nmaytan opened this issue 1 year ago • 0 comments

In ref of #733.

Both cases insist upon a dataframe-compatible shape. For write_dataframe:

d = {'a': [1,2,3], 'b': [4,5,6,7]}
c.write_dataframe(d, key='n')
[...]
ArrowInvalid: Column 1 named b expected length 3 but got length 4

pyarrow.lib.Table.validate() is in the stack trace, and docs say that a pyarrow Table is:

A collection of top-level named, equal length Arrow arrays.

For TableAdapter:

d = {'a': [1,2,3,4,5], 'b': [4,5,6,7,8,9]}
tdf = DataFrameAdapter.from_pydict(d, npartitions=1)
[...]
ValueError: An error occurred while calling the from_dict method registered to the pandas backend.
Original Message: All arrays must be of the same length

This comes from dask.dataframe.from_dict.

Lastly, Dan pointed out that this dict support let us slightly simplify generated_minimal.py, and I've confirmed that a simple dict replicates the example identically.

client['C'].read()
Out[4]:
      x    y    z
0   1.0  2.0  3.0
1   1.0  2.0  3.0
2   1.0  2.0  3.0
3   1.0  2.0  3.0
4   1.0  2.0  3.0
..  ...  ...  ...
95  1.0  2.0  3.0
96  1.0  2.0  3.0
97  1.0  2.0  3.0
98  1.0  2.0  3.0
99  1.0  2.0  3.0

[100 rows x 3 columns]

Checklist

  • [x] Add a Changelog entry
  • [x] Add the ticket number which this PR closes to the comment section

nmaytan avatar Jul 17 '24 23:07 nmaytan