arrow2 icon indicating copy to clipboard operation
arrow2 copied to clipboard

ExternalFormat("min value of a page is required")

Open thesmartwon opened this issue 2 years ago • 4 comments

Not sure if this is an arrow2 or parquet2 problem (or both) but when I write from a PSV with comma separated fields like this:

let arr = arr.as_mut_any().downcast_mut::<MutableListArray<i32, MutablePrimitiveArray::<u8>>>().unwrap();
let val = match r {
	"" => None,
	v => {
		let vals = v
			.split(",")
			.map(|v| Some(v.parse::<u8>().unwrap()));
		Some(vals)
	}
};
arr.try_push(val).expect("pushed");

I get this panic when calling arrow2::io::parquet::write::FileWriter<File>.end():

ExternalFormat("min value of a page is required")

I don't get the panic if I instead try:

let arr = values[i].as_mut_any().downcast_mut::<MutableListArray<i32, MutablePrimitiveArray::<u8>>>().unwrap();
let val = r
	.split(",")
	.map(|v| match v {
		"" => None,
		v => Some(v.parse::<u8>().unwrap())
	});
arr.try_push(Some(val)).expect("pushed");

so I suspect it's a problem with the if null_count as usize == spec.num_values check in parquet2.

Edit: These files also can't be read :(

ExternalFormat("Invalid Parquet file. Corrupt footer")

thesmartwon avatar Jul 07 '22 23:07 thesmartwon

Do you have the list that you tried to write? I am trying to repro this but I am not being able to.

E.g. adding

#[test]
fn list_utf8_nullable() -> Result<()> {
    let data = vec![
        Some(vec![Some("a".to_string())]),
        None,
        Some(vec![None, Some("b".to_string())]),
        Some(vec![]),
        Some(vec![Some("c".to_string())]),
        None,
    ];
    let mut array =
        MutableListArray::<i32, _>::new_with_field(MutableUtf8Array::<i32>::new(), "item", true);
    array.try_extend(data).unwrap();
    list_array_generic(true, array.into())
}

#[test]
fn list_int_nullable() -> Result<()> {
    let data = vec![
        Some(vec![Some(1)]),
        None,
        Some(vec![None, Some(2)]),
        Some(vec![]),
        Some(vec![Some(3)]),
        None,
    ];
    let mut array = MutableListArray::<i32, _>::new_with_field(
        MutablePrimitiveArray::<i32>::new(),
        "item",
        true,
    );
    array.try_extend(data).unwrap();
    list_array_generic(true, array.into())
}

to https://github.com/jorgecarleitao/arrow2/blob/main/tests/it/io/parquet/mod.rs#L1396 still passes.

Was this using the latest main or the latest release (0.12.0)?

jorgecarleitao avatar Jul 09 '22 17:07 jorgecarleitao

I'm also seeing this problem. I believe it only occurs if every entry in the ListArray is empty. In this case, the values array in the ListArray is empty and thus has no min_value.

This occurs with both the latest main and version 0.12.0.

tjwilson90 avatar Jul 13 '22 18:07 tjwilson90

in my case I had an array filled with empty strings vec!["","",""] so after setting values specifically as None value of Option type instead, it worked

letitcrash avatar Aug 22 '22 12:08 letitcrash

Being fixed on https://github.com/jorgecarleitao/parquet2/pull/193

jorgecarleitao avatar Aug 26 '22 03:08 jorgecarleitao

Hi @jorgecarleitao, looks like you landed this fix in parquet2 but haven't published an updated version of parquet2 that includes it (or an arrow2 that depends on said version, thus also including a fix). Gently nudging as a new release with this fix would be most greatly appreciated!

nlhepler avatar Nov 29 '22 00:11 nlhepler

it's in progress #1304

sundy-li avatar Nov 29 '22 07:11 sundy-li