polars icon indicating copy to clipboard operation
polars copied to clipboard

feat(python): allow schema definition from 2-Lists and not only 2-Tuples

Open rancomp opened this issue 1 year ago • 0 comments

closes https://github.com/pola-rs/polars/issues/12178.

As pointed out in the issue, processing a DataFrame with schema of type List[List[str, pl.dtype]] failed silently.

This PR

  • processes lists similarly to tuples when using _unpack_schema
  • adds a unit test to _unpack_schema

Remarks:

  • Because strings are also sequences, we would have to check that the Sequence is not a str instance (see 2nd commit). I wanted to coerce lists and tuples into dictionary, but dict([("foo", int), "ab"]) results in {"foo": int, "a": "b"} which is a weird edge case
  • I'm not sure I understand the intended behavior of lookup_names. Should it affect only column_dtypes or also affect column_names?
In: _unpack_schema(schema=[("foo", int), ("bar", str)], lookup_names=[None, "barbar"])
Out: (['foo', 'bar'], {'foo': Int64, 'barbar': Utf8})
  • If more tests are required please let me know

rancomp avatar Nov 02 '23 20:11 rancomp