datafusion-comet icon indicating copy to clipboard operation
datafusion-comet copied to clipboard

Benchmark and optimize CAST from String to Integer

Open andygrove opened this issue 1 year ago • 1 comments

What is the problem the feature request solves?

https://github.com/apache/datafusion-comet/pull/307 fixes a correctness issue with casting from string to integer, but there is a question about performance in https://github.com/apache/datafusion-comet/pull/307#discussion_r1580451770.

This issue is for benchmarking the native CAST operation versus Spark as well as looking at optimizing the code.

Another area that could be optimized would be to avoid converting a string to a Vec<char> in do_cast_string_to_int:

let chars: Vec<char> = str.chars().collect();

We should be able to just use iterators over the underlying chars but we have to iterate from both start and end of the string, so it isn't trivial.

Describe the potential solution

No response

Additional context

No response

andygrove avatar Apr 26 '24 15:04 andygrove

I plan on working on this once https://github.com/apache/datafusion-comet/pull/307 is merged.

I will write a criterion microbenchmark and compare the current approach with a macro approach, and look into other optimizations.

andygrove avatar Apr 26 '24 17:04 andygrove