usearch icon indicating copy to clipboard operation
usearch copied to clipboard

Feature: Follow Rust safety/soundness conventions, add `_unchecked` methods

Open tnibler opened this issue 9 months ago • 5 comments

Describe what you are looking for

Hi, I just had to look up for the fourth time probably why adding to an Index caused a segfault. Turns out it's because you have to preallocate space using reserve.

This is very surprising behavior and unsound, since safe Rust should never cause undefined behavior such as out of bounds accesses.

In #412, it was mentioned that checking for sufficient capacity is not zero-cost, since it's an atomic read that could cause contention. That's fair, but also probably a rare occurrence in most usage patterns. So what do you think about basically duplicating the API into unsafe *_unchecked methods that don't perform any validation and regular safe functions that perform checks e.g., for sufficient capacity in this case? The standard library and many other lower-level libraries also use this pattern.

This library is really great, but as it is right now any project that uses it is unsound and can violate memory safety in code that is not marked unsafe, which is (almost) the whole point of Rust.

Can you contribute to the implementation?

  • [x] I can contribute

Is your feature request specific to a certain interface?

Other bindings

Contact Details

No response

Is there an existing issue for this?

  • [x] I have searched the existing issues

Code of Conduct

  • [x] I agree to follow this project's Code of Conduct

tnibler avatar Mar 20 '25 13:03 tnibler

Thank you, @tnibler! Would you like to propose the changes in a form of a PR draft? I can schedule/align them with the v3 release 🤗

ashvardanian avatar Jul 08 '25 13:07 ashvardanian

Sure, I'll look into what it would take :)

On July 8, 2025 3:15:50 PM GMT+02:00, Ash Vardanian @.***> wrote:

ashvardanian left a comment (unum-cloud/usearch#574)

Thank you, @tnibler! Would you like to propose the changes in a form of a PR draft? I can schedule/align them with the v3 release 🤗

-- Reply to this email directly or view it on GitHub: https://github.com/unum-cloud/usearch/issues/574#issuecomment-3048927730 You are receiving this because you were mentioned.

Message ID: @.***>

tnibler avatar Jul 11 '25 14:07 tnibler

Sorry forgot to write anything here and then got sidetracked. As mentioned in #642, the Index type isn't currently associated with any element type, so it's not possible to have a memory safe interface as it stands.

What you could do is link the Quantization variant and its respective type, which would only require adding an associated constant to the VectorType trait. Playground link/Gist

Would that be acceptable? It would add a compare and a branch to the safe functions, and the unsafe ones can stay as they are.

All in all:

  • Every Index method generic over T would get a safe version with this check and an _unchecked variant
  • Safe add checks capacity
  • {save,load,view}_from_buffer get safe versions with bounds checks

..I think that's it?

tnibler avatar Oct 27 '25 11:10 tnibler

Thank you @tnibler! Valuable suggestions for the upcoming v3 interface!

If you have a second to look into my other Rust-exposed libs and share your opinion or patches - those would be greatly appreciated too 🤗

ashvardanian avatar Oct 27 '25 12:10 ashvardanian

Oh I just saw that there's capacity checks in the actual library now, so no need to do that again on the Rust side. Calling add or update before ever reserving anything is the only way to make those crash now, because contexts_ starts out empty but is indexed by the thread id:

https://github.com/unum-cloud/USearch/blob/541e882da5a99ad27e495edee32f2003ea5a2ebb/include/usearch/index.hpp#L2790C1-L2796C1

Even though the rust Index::new reserves the max number of threads, because of this early return:

https://github.com/unum-cloud/USearch/blob/541e882da5a99ad27e495edee32f2003ea5a2ebb/include/usearch/index.hpp#L2502C1-L2508C10

nothing is actually allocated. Considering the comment here, maybe that's not intentional?

https://github.com/unum-cloud/USearch/blob/541e882da5a99ad27e495edee32f2003ea5a2ebb/rust/lib.cpp#L213C1-L216C37

I think something like this is reasonable?: https://github.com/tnibler/USearch/commit/b765a868042a4d4f1cf7a6f57946899f3c5fcc33 , along with maybe a check if contexts_.size() is zero in add and co

tnibler avatar Oct 27 '25 21:10 tnibler