arrow2
arrow2 copied to clipboard
ExternalFormat("min value of a page is required")
Not sure if this is an arrow2 or parquet2 problem (or both) but when I write from a PSV with comma separated fields like this:
let arr = arr.as_mut_any().downcast_mut::<MutableListArray<i32, MutablePrimitiveArray::<u8>>>().unwrap();
let val = match r {
"" => None,
v => {
let vals = v
.split(",")
.map(|v| Some(v.parse::<u8>().unwrap()));
Some(vals)
}
};
arr.try_push(val).expect("pushed");
I get this panic when calling arrow2::io::parquet::write::FileWriter<File>.end()
:
ExternalFormat("min value of a page is required")
I don't get the panic if I instead try:
let arr = values[i].as_mut_any().downcast_mut::<MutableListArray<i32, MutablePrimitiveArray::<u8>>>().unwrap();
let val = r
.split(",")
.map(|v| match v {
"" => None,
v => Some(v.parse::<u8>().unwrap())
});
arr.try_push(Some(val)).expect("pushed");
so I suspect it's a problem with the if null_count as usize == spec.num_values
check in parquet2.
Edit: These files also can't be read :(
ExternalFormat("Invalid Parquet file. Corrupt footer")
Do you have the list that you tried to write? I am trying to repro this but I am not being able to.
E.g. adding
#[test]
fn list_utf8_nullable() -> Result<()> {
let data = vec![
Some(vec![Some("a".to_string())]),
None,
Some(vec![None, Some("b".to_string())]),
Some(vec![]),
Some(vec![Some("c".to_string())]),
None,
];
let mut array =
MutableListArray::<i32, _>::new_with_field(MutableUtf8Array::<i32>::new(), "item", true);
array.try_extend(data).unwrap();
list_array_generic(true, array.into())
}
#[test]
fn list_int_nullable() -> Result<()> {
let data = vec![
Some(vec![Some(1)]),
None,
Some(vec![None, Some(2)]),
Some(vec![]),
Some(vec![Some(3)]),
None,
];
let mut array = MutableListArray::<i32, _>::new_with_field(
MutablePrimitiveArray::<i32>::new(),
"item",
true,
);
array.try_extend(data).unwrap();
list_array_generic(true, array.into())
}
to https://github.com/jorgecarleitao/arrow2/blob/main/tests/it/io/parquet/mod.rs#L1396 still passes.
Was this using the latest main or the latest release (0.12.0)?
I'm also seeing this problem. I believe it only occurs if every entry in the ListArray
is empty. In this case, the values
array in the ListArray
is empty and thus has no min_value
.
This occurs with both the latest main and version 0.12.0.
in my case I had an array filled with empty strings vec!["","",""]
so after setting values specifically as None
value of Option
type instead, it worked
Being fixed on https://github.com/jorgecarleitao/parquet2/pull/193
Hi @jorgecarleitao, looks like you landed this fix in parquet2 but haven't published an updated version of parquet2 that includes it (or an arrow2 that depends on said version, thus also including a fix). Gently nudging as a new release with this fix would be most greatly appreciated!
it's in progress #1304