polars
polars copied to clipboard
`schema` argument from `from_dict` doesn't work/do enough
Polars version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Issue description
I expected the schema argument to from_dicts to
-
override schema inference when
infer_schema_length=0, as an escape hatch for problems like #5649. It does not appear to do this, or is buggy -
overwrite the schema when
infer_schema_length>=1, which the docs claim: -- "partially overwrites the inferred schema"; this also does not appear to be happening
Reproducible example
Data isn't loaded for some reason; works if infer_schema_length=1:
import polars as pr
schema={"a": pr.Utf8}
data=[{"a": "aa"}]
pr.from_dicts(data, schema=schema, infer_schema_length=0)
Out [131]:
shape: (1, 1)
┌──────┐
│ a │
│ --- │
│ str │
╞══════╡
│ null │
└──────┘
Should be a list[str] and error, but infers list[bool] and loads null:
schema={"a": pr.List(pr.Utf8)}
data=[{"a": "aa"}]
pr.from_dicts(data, schema=schema, infer_schema_length=0)
Out [132]:
shape: (1, 1)
┌────────────┐
│ a │
│ --- │
│ list[bool] │
╞════════════╡
│ null │
└────────────┘
Expected nullable schema fields to be overwritten from the provided schema, instead it errors:
schema={"a": pr.Struct({"b": pr.Utf8, "c": pr.Utf8})}
data=[{"a": {"b": "bb"}}, {"a": {"b": "bb", "c": "cc"}}]
df = pr.from_dicts(data, infer_schema_length=1, schema=schema); df
thread '<unnamed>' panicked at 'index out of bounds: the len is 1 but the index is 1', /home/runner/work/polars/polars/polars/polars-core/src/series/any_value.rs:136:46
Expected behavior
See above: I would expect three cases to work
-
infer_schema_length>= 1 and noschema-- only inference -
infer_schema_length= 0 andschema-- no inference and only explicitly provided schema -
infer_schema_length>=1 andschema-- inference, but then the explicit schema overrides it
Installed versions
---Version info---
Polars: 0.15.1
Index type: UInt32
Platform: Linux-4.19.0-22-cloud-amd64-x86_64-with-glibc2.10
Python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
[GCC 10.3.0]
---Optional dependencies---
pyarrow: 7.0.1
pandas: 1.5.1
numpy: 1.23.4
fsspec: 2022.10.0
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: 3.6.2