polars
polars copied to clipboard
polars.read_json fails when reading empty list from json response
Polars version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of Polars.
Issue description
Noticed this was addressed in a recent fix for read_ndjson but seems to have slipped through on read_json.
polars.read_json fails when the response is empty, seems as if the parser doesn't know what to to do in this case and returns the following error BindingsError:"ArrowError(NotYetImplemented("read an Array from a non-Array data type"))"
May be a bit of an edge case but is somewhat of an issue when dealing with json responses from http requests that show 200 but give an empty body. Currently circumvent it by passing the response via json.loads (well orjson given it's improvements vs stdlib json) to a DataFrame constructor i.e. as shown below. Feel the read_json should be able to handle this
Getting deeper and deeper into Rust out of personal interest, so will try and submit a pull request to fix myself if I get the time - but realise this may be a trivial handle for someone else
Reproducible example
import polars as pl
import json
empty_list = b'[]'
pl.read_json(empty_list) -> gives a BindingsError
pl.DataFrame(json.loads(empty_list)) -> outputs to empty DataFrame
Expected behavior
Would expect this simply to output to an empty DataFrame
Installed versions
---Version info---
Polars: 0.16.11
Index type: UInt32
Platform: Windows-10-10.0.22621-SP0
Python: 3.10.10 (tags/v3.10.10:aad5f6a, Feb 7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
---Optional dependencies---
pyarrow: <not installed>
pandas: <not installed>
numpy: <not installed>
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
deltalake: <not installed>
matplotlib: <not installed>
Incase it's relevant, I just ran into a similar sort of issue: #6745
I found .json_extract()
which seems to handle some cases where .read_json
fails.
It returns null
in this particular case, as opposed to an empty dataframe.
import polars as pl
empty_list = b"[]"
pl.DataFrame({"json": [empty_list]}).with_columns(
pl.col("json").cast(pl.Utf8).str.json_extract())
shape: (1, 1)
┌──────┐
│ json │
│ --- │
│ null │
╞══════╡
│ null │
└──────┘
I'm not sure if it's related but there is also a problem with empty lists in rust code:
//!
//! ```cargo
//! [dependencies]
//! polars = { version = "0.36.2", features = ["json"] }
//! ```
use polars::prelude::*;
fn main() {
let f = std::io::Cursor::new("[]");
let df = JsonReader::new(f).finish();
println!("{:?}", df);
}
output:
thread 'main' panicked at /home/mk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-io-0.36.2/src/json/infer.rs:19:10:
called `Option::unwrap()` on a `None` value
stack backtrace:
0: rust_begin_unwind
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
2: core::panicking::panic
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:127:5
3: core::option::Option<T>::unwrap
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/option.rs:931:21
4: polars_io::json::infer::json_values_to_supertype
at /home/mk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-io-0.36.2/src/json/infer.rs:10:5
5: <polars_io::json::JsonReader<R> as polars_io::SerReader<R>>::finish
at /home/mk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-io-0.36.2/src/json/mod.rs:252:25
6: p_test_d8b0191af191a30a7a7c462b::main
at ./p-test.rs:12:14
7: core::ops::function::FnOnce::call_once
at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
I had the same problem in Rust, checking if the JSON is empty before calling JsonReader::new(f)
helped to work around this error, might make sense in Python too.