SimSIMD
SimSIMD copied to clipboard
golang: separate c functions and cache simsimd_metric_punned
The changes move the inline C functions from simsimd.go to a new file, simsimd.c. This separation enhances code organization and readability. It also allows for better management of the C code, which can now be modified independently from the Go code.
More importantly, we now cache the results from simsimd_metric_punned instead of determining the capabilities for each call. This improves the benchmark from 1940ns/op to 1320ns/op on my system.
Note: this is still 4x slower than the native Go implementation, but that's better than 6x 🤣
Hi, @corani! You are right to evaluate the dynamic dispatch just once. I think we should generalize it and implement in an identical way to how I implement it in StringZilla. That is more laborious, but can be reused across different languages.
@pplanel has recently pushed Rust bindings, but they are slower than native Rust code, because he doesn't cache the pointer in any way. In case any of you guys want to implement it, I'm happy to provide guidance, but won't be able to work on it actively in the coming weeks 🤗
Hey @ashvardanian, I'm interested know more about this benchmark and how can the pointer caching be done.
The Rust binding benchmark are comparing cosine and sqeuclidean against their respective implementations in SimSIMD.
And I'm seeing this results:
Cosine
SqEuclidean
This is interesting, @pplanel. I must have misread the timings in the console.
The common approach is to have a static structure with pointers, that is populated when the shared library is loaded. Then, all the function calls go through that lookup table. The StringZilla snippet is a pretty good example, I believe.
I'm unable to update the PR for resolve the conflict:
! [remote rejected] corani/perf -> corani/perf (refusing to allow a Personal Access Token to create or update workflow `.github/workflows/prerelease.yml` without `workflow` scope)
Hi, @corani! You are right to evaluate the dynamic dispatch just once. I think we should generalize it and implement in an identical way to how I implement it in StringZilla. That is more laborious, but can be reused across different languages.
That'll have to be done by someone with actual experience writing C code 😉