polars icon indicating copy to clipboard operation
polars copied to clipboard

`schema` argument from `from_dict` doesn't work/do enough

Open indigoviolet opened this issue 2 years ago • 0 comments

Polars version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Issue description

I expected the schema argument to from_dicts to

  1. override schema inference when infer_schema_length=0, as an escape hatch for problems like #5649. It does not appear to do this, or is buggy

  2. overwrite the schema when infer_schema_length >=1, which the docs claim: -- "partially overwrites the inferred schema"; this also does not appear to be happening

Reproducible example

Data isn't loaded for some reason; works if infer_schema_length=1:



import polars as pr
schema={"a": pr.Utf8}
data=[{"a": "aa"}]
pr.from_dicts(data, schema=schema, infer_schema_length=0)

Out [131]:
shape: (1, 1)
┌──────┐
│ a    │
│ ---  │
│ str  │
╞══════╡
│ null │
└──────┘


Should be a list[str] and error, but infers list[bool] and loads null:

schema={"a": pr.List(pr.Utf8)}
data=[{"a": "aa"}]
pr.from_dicts(data, schema=schema, infer_schema_length=0)

Out [132]:
shape: (1, 1)
┌────────────┐
│ a          │
│ ---        │
│ list[bool] │
╞════════════╡
│ null       │
└────────────┘

Expected nullable schema fields to be overwritten from the provided schema, instead it errors:

schema={"a": pr.Struct({"b": pr.Utf8, "c": pr.Utf8})}
data=[{"a": {"b": "bb"}}, {"a": {"b": "bb", "c": "cc"}}]
df = pr.from_dicts(data, infer_schema_length=1, schema=schema); df



thread '<unnamed>' panicked at 'index out of bounds: the len is 1 but the index is 1', /home/runner/work/polars/polars/polars/polars-core/src/series/any_value.rs:136:46

Expected behavior

See above: I would expect three cases to work

  1. infer_schema_length >= 1 and no schema -- only inference

  2. infer_schema_length = 0 and schema -- no inference and only explicitly provided schema

  3. infer_schema_length >=1 and schema -- inference, but then the explicit schema overrides it

Installed versions

---Version info---
Polars: 0.15.1
Index type: UInt32
Platform: Linux-4.19.0-22-cloud-amd64-x86_64-with-glibc2.10
Python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10) 
[GCC 10.3.0]
---Optional dependencies---
pyarrow: 7.0.1
pandas: 1.5.1
numpy: 1.23.4
fsspec: 2022.10.0
connectorx: <not installed>
xlsx2csv: <not installed>
matplotlib: 3.6.2


indigoviolet avatar Nov 29 '22 20:11 indigoviolet