polars icon indicating copy to clipboard operation
polars copied to clipboard

polars.read_json fails when reading empty list from json response

Open liammcknight95 opened this issue 1 year ago • 3 comments

Polars version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of Polars.

Issue description

Noticed this was addressed in a recent fix for read_ndjson but seems to have slipped through on read_json.

polars.read_json fails when the response is empty, seems as if the parser doesn't know what to to do in this case and returns the following error BindingsError:"ArrowError(NotYetImplemented("read an Array from a non-Array data type"))"

May be a bit of an edge case but is somewhat of an issue when dealing with json responses from http requests that show 200 but give an empty body. Currently circumvent it by passing the response via json.loads (well orjson given it's improvements vs stdlib json) to a DataFrame constructor i.e. as shown below. Feel the read_json should be able to handle this

Getting deeper and deeper into Rust out of personal interest, so will try and submit a pull request to fix myself if I get the time - but realise this may be a trivial handle for someone else

Reproducible example

import polars as pl
import json   

empty_list = b'[]'

pl.read_json(empty_list) -> gives a BindingsError

pl.DataFrame(json.loads(empty_list)) -> outputs to empty DataFrame

Expected behavior

Would expect this simply to output to an empty DataFrame

Installed versions

---Version info---
Polars: 0.16.11
Index type: UInt32
Platform: Windows-10-10.0.22621-SP0
Python: 3.10.10 (tags/v3.10.10:aad5f6a, Feb  7 2023, 17:20:36) [MSC v.1929 64 bit (AMD64)]
---Optional dependencies---
pyarrow: <not installed>
pandas: <not installed>
numpy: <not installed>
fsspec: <not installed>
connectorx: <not installed>
xlsx2csv: <not installed>
deltalake: <not installed>
matplotlib: <not installed>

liammcknight95 avatar Mar 05 '23 21:03 liammcknight95

Incase it's relevant, I just ran into a similar sort of issue: #6745

I found .json_extract() which seems to handle some cases where .read_json fails.

It returns null in this particular case, as opposed to an empty dataframe.

import polars as pl

empty_list = b"[]"

pl.DataFrame({"json": [empty_list]}).with_columns(
   pl.col("json").cast(pl.Utf8).str.json_extract())
shape: (1, 1)
┌──────┐
│ json │
│ ---  │
│ null │
╞══════╡
│ null │
└──────┘

cmdlineluser avatar Mar 05 '23 22:03 cmdlineluser

I'm not sure if it's related but there is also a problem with empty lists in rust code:

//!
//! ```cargo
//! [dependencies]
//! polars = { version = "0.36.2", features = ["json"] }
//! ```

use polars::prelude::*;

fn main() {
    let f = std::io::Cursor::new("[]");
    let df = JsonReader::new(f).finish();
    println!("{:?}", df);
}

output:

thread 'main' panicked at /home/mk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-io-0.36.2/src/json/infer.rs:19:10:
called `Option::unwrap()` on a `None` value
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
   2: core::panicking::panic
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:127:5
   3: core::option::Option<T>::unwrap
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/option.rs:931:21
   4: polars_io::json::infer::json_values_to_supertype
             at /home/mk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-io-0.36.2/src/json/infer.rs:10:5
   5: <polars_io::json::JsonReader<R> as polars_io::SerReader<R>>::finish
             at /home/mk/.cargo/registry/src/index.crates.io-6f17d22bba15001f/polars-io-0.36.2/src/json/mod.rs:252:25
   6: p_test_d8b0191af191a30a7a7c462b::main
             at ./p-test.rs:12:14
   7: core::ops::function::FnOnce::call_once
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/ops/function.rs:250:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

migel avatar Jan 05 '24 23:01 migel

I had the same problem in Rust, checking if the JSON is empty before calling JsonReader::new(f) helped to work around this error, might make sense in Python too.

Delt4Nin3 avatar Feb 22 '24 17:02 Delt4Nin3