Antoine Pitrou
Antoine Pitrou
`InputType` and `OutputType` instances are only created at kernel registration, if I'm not mistaken?
> But those Input/OutputType instances are stored in the KernelSignature, so those references need to stay valid. My question was more along the lines of: what is the point of...
> @pitrou this isn’t a micro-optimization imho — when 1% of your binary size is concerned with inline shared_ptr code in this narrow part of the library that speaks to...
> The use of shared pointers also causes extra overhead in kernel dispatching and input type checking, but we would need to write some benchmarks to quantify that better I...
I took the PR and rebased it internally to then compare the sizes: * master: ``` $ size build-release/release/libarrow.so text data bss dec hex filename 23471557 317416 2568225 26357198 1922dce...
If there's no sizable performance improvement then I'm against merging this. Robustness should be a guiding principle and replacing shared pointers with raw pointers goes against it.
> * Given a random selection of non-equal values I'd add that non-random but likely selections should also show a nice hash distribution. Including: * ranges of consecutive `[0, N]`...
Why 32-bit rather than 64-bit?
There's https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/hashing_benchmark.cc at least
> `util/hashing.h` is a more-or-less direct adapter between Arrow and xxhash. That's how it ended up working in practice for non-tiny strings, but the basic objective is to have some...