parquet2
parquet2 copied to clipboard
Updating parquet-tools
Im trying to update the parquet-tools with the changes after the Delayed dictionary
(#160) PR.
Im using the read::decompress
command to extract the page and then Im using this function to decode the buffer
pub fn read<T: NativeType>(
buf: &[u8],
num_values: usize,
_is_sorted: bool,
) -> Result<PrimitivePageDict<T>> {
let size_of = std::mem::size_of::<T>();
let typed_size = num_values.wrapping_mul(size_of);
let values = buf.get(..typed_size).ok_or_else(|| {
Error::OutOfSpec(
"The number of values declared in the dict page does not match the length of the page"
.to_string(),
)
})?;
let values = values.chunks_exact(size_of).map(decode::<T>).collect();
Ok(PrimitivePageDict::new(values))
}
Which is the same function used to decode the page previously.
However, the read values from a sample file are wrong and do not represent the saved values in the file.
Am I missing something during the page decompression stage?
Hey @elferherrera !
Do you have a draft PR I could look at? I think that the function is correct.
Hello! %) any updates on this?