nushell
nushell copied to clipboard
Incorrect dtype of object of `polars into-df` when numeric and null values are mixed.
Describe the bug
Discovered by @maxim-uvarov
Converting to a dataframe with null and numeric values results in the dataframe column being of dtype object. Futhermore, when attempting to apply a schema, the column still of dtype object.
Ideally, the Value::Nothing types should be converted to NaN polars values when the rest of the table is numeric. Though, this could be problematic when inferring the schema.
Minimally, an error should be returned when a column cannot be created as the type provided by the schema.
How to reproduce
[[a b]; [6 2] [1 1] [1 4] [2 null]] | polars into-df --schema {a: i64, b: i64} | polars schema
Expected behavior
- An error should occur if the column cannot be converted to the specified type in the schema
- Convert the null values to NaN if possible.
Screenshots
Configuration
| key | value |
|---|---|
| version | 0.93.0 |
| major | 0 |
| minor | 93 |
| patch | 0 |
| branch | |
| commit_hash | |
| build_os | macos-aarch64 |
| build_target | aarch64-apple-darwin |
| rust_version | rustc 1.77.2 (25ef9e3d8 2024-04-09) |
| rust_channel | stable-aarch64-apple-darwin |
| cargo_version | cargo 1.77.2 (e52e36006 2024-03-26) |
| build_time | 2024-05-01 09:22:02 -07:00 |
| build_rust_channel | release |
| allocator | mimalloc |
| features | default, sqlite, system-clipboard, trash, which |
| installed_plugins | plist, polars |
Additional context
Added to the Polars roadmap backlog
This might be the root cause for https://github.com/nushell/nushell/issues/13185, where [[a]; [null]] or [[a]; [""] [1]] crash polars into-df | polars to-parquet.
Manually specifying a schema (polars into-df -s {a: str}) still results in an object schema:
❯ [[a]; ["a"] [null]] | polars into-df -s {a: str} | polars schema
a object
I found no way to specify a "string or nothing" type for the column, hoping that this would resolve the problem. Guessing from the syntax of optional function arguments I tried polars into-df -s {a?: str}, but that adds a a? column instead.