nushell icon indicating copy to clipboard operation
nushell copied to clipboard

Incorrect dtype of object of `polars into-df` when numeric and null values are mixed.

Open ayax79 opened this issue 1 year ago • 1 comments

Describe the bug

Discovered by @maxim-uvarov

Converting to a dataframe with null and numeric values results in the dataframe column being of dtype object. Futhermore, when attempting to apply a schema, the column still of dtype object.

Ideally, the Value::Nothing types should be converted to NaN polars values when the rest of the table is numeric. Though, this could be problematic when inferring the schema.

Minimally, an error should be returned when a column cannot be created as the type provided by the schema.

How to reproduce

[[a b]; [6 2] [1 1] [1 4] [2 null]] | polars into-df --schema {a: i64, b: i64} | polars schema

Expected behavior

  1. An error should occur if the column cannot be converted to the specified type in the schema
  2. Convert the null values to NaN if possible.

Screenshots

Screenshot 2024-05-01 at 15 32 36

Configuration

key value
version 0.93.0
major 0
minor 93
patch 0
branch
commit_hash
build_os macos-aarch64
build_target aarch64-apple-darwin
rust_version rustc 1.77.2 (25ef9e3d8 2024-04-09)
rust_channel stable-aarch64-apple-darwin
cargo_version cargo 1.77.2 (e52e36006 2024-03-26)
build_time 2024-05-01 09:22:02 -07:00
build_rust_channel release
allocator mimalloc
features default, sqlite, system-clipboard, trash, which
installed_plugins plist, polars

Additional context

Added to the Polars roadmap backlog

ayax79 avatar May 01 '24 22:05 ayax79

This might be the root cause for https://github.com/nushell/nushell/issues/13185, where [[a]; [null]] or [[a]; [""] [1]] crash polars into-df | polars to-parquet.

Manually specifying a schema (polars into-df -s {a: str}) still results in an object schema:

❯ [[a]; ["a"] [null]] | polars into-df -s {a: str} | polars schema
 a   object

I found no way to specify a "string or nothing" type for the column, hoping that this would resolve the problem. Guessing from the syntax of optional function arguments I tried polars into-df -s {a?: str}, but that adds a a? column instead.

devurandom avatar Jun 20 '24 12:06 devurandom