Geoffrey Claude

Results 10 comments of Geoffrey Claude

General comment on the benchmark but... am I reading them wrong, or is the `null_percent` input logic inverted? ```rust fn do_benches( c: &mut Criterion, array_length: usize, in_list_length: usize, null_percent: f64,...

Another general comment, on the implementation this time: hashing seems overkill and probably overly expensive for small simple type lists. @adriangb have you considered sorting the `InList` and doing a...

> > Another general comment, on the implementation this time: hashing seems overkill and probably overly expensive for small simple type lists. > > @adriangb have you considered sorting the...

> @Dandandan is already on the list https://github.com/alamb/datafusion-benchmarking/blob/4fb120785fa66ecbf40a45d8a5d0d5f4be17266a/scripts/scrape_comments.py#L41 > > I can add @geoffreyclaude if he would like @alamb yes please :) Especially for when/if I follow through with the...

@alamb I opened https://github.com/apache/datafusion/pull/19265 to close this issue. I kept it relatively concise, in line with the other docs. Let me know if you think it needs to be more...

> What about `slice::contains`? Seems like it should be somewhere between the const-sized approach and binary search in terms of threshold window. It loses all the time against the branchless...

@Dandandan See https://github.com/geoffreyclaude/datafusion/pull/14 for an in-depth micro benchmark and analysis of the different search algorithms. TL;DR: It's always branchless up to the SIMD limit, then hashset. Slice Search Benchmark

I've opened https://github.com/apache/datafusion/pull/19376 as a preliminary PR to extend the benchmarks.

`./auto-fix.sh` would definitely be super useful! I'd wire it up to be a `git commit` hook.