parquet2 icon indicating copy to clipboard operation
parquet2 copied to clipboard

Updating parquet-tools

Open elferherrera opened this issue 2 years ago • 2 comments

Im trying to update the parquet-tools with the changes after the Delayed dictionary (#160) PR.

Im using the read::decompress command to extract the page and then Im using this function to decode the buffer

pub fn read<T: NativeType>(
    buf: &[u8],
    num_values: usize,
    _is_sorted: bool,
) -> Result<PrimitivePageDict<T>> {
    let size_of = std::mem::size_of::<T>();

    let typed_size = num_values.wrapping_mul(size_of);

    let values = buf.get(..typed_size).ok_or_else(|| {
        Error::OutOfSpec(
            "The number of values declared in the dict page does not match the length of the page"
                .to_string(),
        )
    })?;

    let values = values.chunks_exact(size_of).map(decode::<T>).collect();

    Ok(PrimitivePageDict::new(values))
}

Which is the same function used to decode the page previously.

However, the read values from a sample file are wrong and do not represent the saved values in the file.

Am I missing something during the page decompression stage?

elferherrera avatar Aug 14 '22 20:08 elferherrera

Hey @elferherrera !

Do you have a draft PR I could look at? I think that the function is correct.

jorgecarleitao avatar Aug 15 '22 04:08 jorgecarleitao

Hello! %) any updates on this?

little-arhat avatar Apr 07 '23 20:04 little-arhat