polars-plugins-tutorial icon indicating copy to clipboard operation
polars-plugins-tutorial copied to clipboard

Show an example of generic numeric expressions

Open paddymul opened this issue 2 months ago • 5 comments

This could be more of a polars-rs issue, but is there a generic way to get any integer type Series?

let left: &Int64Chunked = inputs[0].i64()?;

is shown in most of the examples, How do I get a ChunkedArray that implements the PolarsIntegerType trait? Do I have to write a big match statement?

Apologies if I'm not using the proper rust terminology. I'm learning the language, this tutorial made it seem within reach... Great work

paddymul avatar Oct 01 '25 14:10 paddymul

Do I have to write a big match statement?

yup. or make a macro

I'm learning the language, this tutorial made it seem within reach... Great work

thanks!

MarcoGorelli avatar Oct 01 '25 14:10 MarcoGorelli

I was able to get this far succesfully

fn hash_i64_chunked(cb: &Int64Chunked) -> u64 {
    let mut hasher = XxHash64::with_seed(SEED);
    for val in cb.iter() {
        match val {
            Some(val) => {hasher.write(&val.to_le_bytes())}
            _ => {hasher.write(b" ")}
        }
    }
    hasher.finish()
}

fn hash_u64_chunked(cb: &UInt64Chunked) -> u64 {
    let mut hasher = XxHash64::with_seed(SEED);
    for val in cb.iter() {
        match val {
            Some(val) => {hasher.write(&val.to_le_bytes())}
            _ => {hasher.write(b" ")}
        }
    }
    hasher.finish()
}

#[polars_expr(output_type=UInt64)]
fn hash_series(inputs: &[Series]) -> PolarsResult<Series> {
    let chunks = &inputs[0];

    if let Ok(ichunks) = chunks.i64() {
        let hash = hash_i64_chunked(ichunks);
        return Ok(Series::new("hash".into(), vec![hash]));
    }
    if let Ok(ichunks) = chunks.u64() {
        let hash = hash_u64_chunked(ichunks);
        return Ok(Series::new("hash".into(), vec![hash]));
    }
    return Err(PolarsError::ComputeError("couldn't compute for type".into()));
}

I'm having a lot of trouble writing a hash_generic_chunked function.

So far I am this close:

fn hash_generic_chunked<T> (cb: &ChunkedArray  <T>) -> u64 
where
    T: PolarsNumericType
{            
    let mut hasher = XxHash64::with_seed(SEED);
    for val in cb.iter() {
        match val {
            Some(val) => {hasher.write(&val.to_le_bytes())}
            _ => {hasher.write(b" ")}
        }
    }
    hasher.finish()
}

this fails though with the following error messages

error[E0308]: mismatched types
   --> src/expressions.rs:69:40
    |
69  |             Some(val) => {hasher.write(&val.to_le_bytes())}
    |                                  ----- ^^^^^^^^^^^^^^^^^^ expected `&[u8]`, found `&<... as NativeType>::Bytes`
    |                                  |
    |                                  arguments to this method are incorrect
    |
    = note: expected reference `&[u8]`
               found reference `&<<T as polars::prelude::PolarsNumericType>::Native as NativeType>::Bytes`
    = help: consider constraining the associated type `<<T as polars::prelude::PolarsNumericType>::Native as NativeType>::Bytes` to `[u8]`
    = note: for more information, visit https://doc.rust-lang.org/book/ch19-03-advanced-traits.html
note: method defined here
   --> /Users/paddy/.rustup/toolchains/nightly-2025-05-21-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/hash/mod.rs:358:8
    |
358 |     fn write(&mut self, bytes: &[u8]);
    |        ^^^^^

It looks like some of the types around PolarsNumericType subtly changed between 0.49 to 0.51.

This is looking like some very complex type gymnastics, I wonder if I'm better off writing concrete implementations for all of the base types.

I could see how to_le_bytes is a rarely used method that has specific restrictions so it isn't suited to generics, however there are many ChunkedArray computations that are generic for at a minimum Int, UInt, Float`, and an example for the tutorial around writing these could be helpful.

paddymul avatar Oct 01 '25 18:10 paddymul

well I wrote my first Rust macro and got the tests to pass.

macro_rules! hash_func {
    ($a:ident, $b:ty, $type_num:expr) => {
        fn $a(cb: $b) -> u64 {
            let mut hasher = XxHash64::with_seed(SEED);
            hasher.write(&hardcode_bytes($type_num));
            let mut count:u64 = 0;
            for val in cb.iter() {
                count += 1;
                match val {
                    Some(val) => {hasher.write(&val.to_le_bytes())}
                    _ => {hasher.write(NAN_SEPERATOR);}
                }
                hasher.write(&count.to_le_bytes());
            }
            hasher.finish()
        }
    };
}


hash_func!(hash_i64_chunked, &Int64Chunked, 1);
hash_func!(hash_i32_chunked, &Int32Chunked, 2);

this expands to


// non macro implementation for reference
    fn hash_i64_chunked(cb: &Int64Chunked) -> u64 {
        let mut hasher = XxHash64::with_seed(SEED);
        hasher.write(&hardcode_bytes(1));
        let mut count: u64 = 0;
        for val in cb.iter() {
            count += 1;
            match val {
                Some(val) => { hasher.write(&val.to_le_bytes()) }
                _ => { hasher.write(NAN_SEPERATOR); }
            }
            hasher.write(&count.to_le_bytes());
        }
        hasher.finish()
    }

macro expansion was checked with cargo rustc --profile=check -- -Zunpretty=expanded

Just leaving this here for reference if other people are trying to figure out a solution

paddymul avatar Oct 02 '25 15:10 paddymul

For me you can close this issue and use it as reference. I will probably write a blog post about my experience writing a polars plugin, it will include this part about the macros. Would a section like this be appropriate for the tutorial? I might be able to submit a PR if you're interested. The tutorial section would probably just genercize the sum function.

paddymul avatar Oct 02 '25 15:10 paddymul

yeah a section on macros may be useful

MarcoGorelli avatar Oct 02 '25 15:10 MarcoGorelli