Rust bindings
Rust may become one of the core officially maintained interfaces, but we still need an implementation proposal. A potential candidate for this job can find guidelines and recommendations for new bindings in our docs here. If you have questions or recommendations for that interface - as anyone on Discord or one of the maintainers here.
This is doable for the most part. We can automatically generate raw bindings from ukv's C/C++ code base (or the SDK if it exists) either via the bindgen or cbindgen crates. From there, we can work towards implementing an additional safety layer.
The ambition is not just to make bindings, which are pretty trivial, but to make them efficient and familiar to the end user. In the case of Python, our logic was:
- Everyone knows how to work with
dict, so our binary collections should have the same interfaces asdict[int, str]. - Pandas is the most popular package for tabular data, so our tabular interface mimics that.
- NetworkX is the most popular package for graphs, so our graph interface mimics that.
Let's start not from automated bindings but by defining what the interface should look like. The 1st point is obvious. Every language has a standard library with associative containers. What about the second and the third? Which packages for Graphs and Docs/Tables do people love the most? Maybe some form of ORM?
@chungquantin
Well, there is always polars, which is widely used in the Rust ecosystem for working with tabular data. Not really sure whether there is a Rust-based alternative for NetworkX, but something I found is gchemol-graph.
But I sort of don't get the point in doing it this way though. Why would we possibly sacrifice the performance benefits of ukv and not port the library directly into something like ukv-sys (Rust library with raw bindings) and then use it for creating ukv (the actual Rust library with Rust-friendly bindings using ukv-sys under-the-hood) which would provide a Rust-y interface to native ukv?
- For Pandas, Rust developers usually use this package Polars instead. Will see if we can convert between these two packages. Building a whole tabular interface can be duplicate
For C++ binding, this crate provides a better suite of tools for using C++ inside Rust: cxx
The difference between cxx and cbindgen / bindgen is that the later is more native but produces many unexpected bugs while the former allows doing FFI without unsafe mode.
Polars sounds cool. They support Apache Arrow representations, so we must be able to pass the data there without any copies. The only tricky part is remembering that every such table would still be backed by ukv_arena_t and must be borrowed together. There should be a way to describe that logic in the bindings, but I am not sure how exactly. I will try to learn some Rust as we go :)
As for the Graph interface, there is a package previously called RetworkX, now RustworkX, which also aims to mimic NetworkX interface, reimplementing the algorithms in Rust for in-memory graphs. Their Rust interface looks very similar to Python:
let mut g = Graph::new();
let a = g.add_node((0., 0.));
let b = g.add_node((2., 0.));
let c = g.add_node((1., 1.));
let d = g.add_node((0., 2.));
let e = g.add_node((3., 3.));
let f = g.add_node((4., 2.));
g.extend_with_edges(&[
(a, b, 2),
(a, d, 4),
(b, c, 1),
(b, f, 7),
(c, e, 5),
(e, f, 1),
(d, e, 1),
]);
Performance-wise, it didn't perform that well in our benchmarks, but it can be a good reference point for API discussions.
The API ukv_database_init_t() have a field database which has type ukv_database_t. This type is opaque type void and as I understand, this is a pointer to the underlying storage. Hence, what is the way to initialize it?
Exactly the same way as in the C guide:
ukv_database_t db { NULL };
ukv_error_t error { NULL };
ukv_database_init_t init {
.db = &db,
.config = "{}",
.error = &error,
};
ukv_database_init(&init);
You set it to NULL initially, and then a pointer is written into it by the ukv_database_init function call. Classic ANSI C.
The API
ukv_database_init_t()have a fielddatabasewhich has typeukv_database_t. This type is opaque type void and as I understand, this is a pointer to the underlying storage. Hence, what is the way to initialize it?
We can use Rust's std::ffi::c_void, which is the equivalent of C void*. If for any reason we would want to convert a variable into a void*, we will do it the following way:
let foo = &mut "abc" as *mut _ as *mut std::ffi::c_void;
Basically, it converts the mutable reference to a raw mutable pointer (will require an unsafe block when accessing the pointer) with an inferred type and then convert it to a mutable void*. Making it a *const std::ffi::c_void is also possible.
Add a language bindings and simple database close / open in this PR: https://github.com/unum-cloud/ukv/pull/243. Currently have problems with generating bindings on header files with linked files. Have any idea about this issue? @michaelgrigoryan25
@chungquantin ~~I did not really experience any issues during the build process, however,~~ I have made some improvements in https://github.com/chungquantin/ukv/pull/1. ~~Let me know if I'm missing something here, because everything ran perfectly fine for me. There were some warnings about the code style because of the generated bindings, but I have made sure to address those too in the PR.~~
Edit:
Never mind. I think you are relating to including other header files from the include directory in the wrapper.h file like this, right?
#include "../include/ukv/db.h"
#include "../include/ukv/ukv.h" // this line creates an error
and you most probably get this error:
error: failed to run custom build command for `ukv v0.0.1 (/home/_/Public/ukv/rust)`
Caused by:
process didn't exit successfully: `/home/_/Public/ukv/rust/target/debug/build/ukv-a7df0344378206ed/build-script-build` (exit status: 1)
--- stdout
cargo:rerun-if-changed=wrapper.h
cargo:rustc-link-search=../include/ukv
--- stderr
./../include/ukv/ukv.h:47:10: fatal error: 'ukv/db.h' file not found
Error: ClangDiagnostic("./../include/ukv/ukv.h:47:10: fatal error: 'ukv/db.h' file not found\n")
Edit 2:
Should be fixed in https://github.com/chungquantin/ukv/pull/1/commits/84b293a3f4ddd361a1cc64811888ac999256db77. Let me know when you give it a try. Clang needed some arguments to accomplish this:
let bindings = bindgen::Builder::default()
.header("./wrapper.h")
.clang_args(&["-I../include", "-I../include/ukv"])
.detect_include_paths(true)
.parse_callbacks(Box::new(bindgen::CargoCallbacks))
.generate()?;
Great to see your progress, guys! I have created a branch for this line of work - 238-rust-bindings. @chungquantin let's change your pull request to merge into this new branch. I am also curious if rust/src/ukv/bindings.rs should be included into .gitignore?
I have removed the generated bindings in https://github.com/chungquantin/ukv/pull/1 and it will not be tracked in git anymore, @ashvardanian. The PR is still pending though. There should be no need to add anything else to .gitignore after it is successfully merged.
Great, @michaelgrigoryan25 ! Let's wait for him to merge your updates, and then I will merge his into 238-rust-bindings.
@ashvardanian Sync the update from @michaelgrigoryan25. Changed the merged branch to 238-rust-bindings
We might have some issues here with the actual functionality with the SDK. As I was writing some unit tests, rewriting and improving some parts of the crate in my fork over at https://github.com/unum-cloud/ukv/commit/072e5fc160b1e8650f9367fd6cdce33c67b4f5c6, I started getting some linkage errors when running cargo test. Here is the full stack trace for reference:
ukv/rust/src/lib.rs:35: undefined reference to `ukv_database_init'
collect2: error: ld returned 1 exit status
= note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
= note: use the `-l` flag to specify native libraries to link
= note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/ca
rgo/reference/build-scripts.html#cargorustc-link-libkindname)
error: could not compile `ukv` due to previous error
cargo check and cargo build will not output any errors, however; these errors will be present at the time when the end user adds the crate and actually starts using it. I believe we will need a .a or a .o file to successfully perform the process, ~~and to do this, we might need cc as a build dependency if there is no direct way of building the library from the parent directory~~. @ashvardanian, @chungquantin, do you have any other ideas on how we can approach this issue?
The actual code of the unit test is pretty straightforward. I thought something like this might happen, so I have kept it simple stupid. You can find it here.
As it is a native library it must be compiled and linked. So the CMake build should be called before building Rust extension. Is it a good practice to use relative import paths in Rust? Especially for parent directories? It is generally considered a code-smell in most languages.
PS: We can schedule a short sync-up call one of the next days to synchronize work, if that would help. Ping me on Discord if it sounds useful.
As it is a native library it must be compiled and linked. So the CMake build should be called before building Rust extension.
Thanks for your input, @ashvardanian! You are right. In that case, we can either use the cmake crate or just execute the build command specified in the documentation via std::process::Command, @chungquantin.
By the way, after cmake-ing and make-ing, what is the location and the filename of the ukv library in build_release? I followed the documentation at https://unum.cloud/ukv/install.html and tried looking for them in build_release/lib and build_release/build/lib, but only found libbenchmark.a libbenchmark_main.a libfmt.a libleveldb.a librocksdb.a libsimdjson.a libukv_embedded_leveldb.a libukv_embedded_rocksdb.a libyyjson.a.
Is it a good practice to use relative import paths in Rust? Especially for parent directories? It is generally considered a code-smell in most languages.
I would not say that it is absolutely discouraged, but generally speaking, folks usually prefer to use an environment variable or an absolute path instead of a relative one. Doing it this way also works. Current changes to build.rs in my fork are temporary. I have already prepared a function which would get the parent directory without manually specifying it as ../.
@michaelgrigoryan25 you have found exactly what you need. UKV has a number of compilation options (check the beginning of CMakeLists.txt), depending on which it will compile one or more libraries. Depending on the one you pick - you will get a different backend:
- libukv_embedded_umem
- libukv_embedded_leveldb
- libukv_embedded_rocksdb
- libukv_flight_client
For some reason I am getting some errors related to OpenSSL when cmake-ing:
[ 94%] Built target rocksdb
CMake Error at CMakeLists.txt:810 (add_dependencies):
The dependency target "OpenSSL::Crypto" of target "arrow_dependencies" does
not exist.
CMake Error at CMakeLists.txt:810 (add_dependencies):
The dependency target "OpenSSL::SSL" of target "arrow_dependencies" does
not exist.
[ 94%] Built target ukv_embedded_leveldb
[ 94%] Building CXX object CMakeFiles/ukv_embedded_rocksdb.dir/src/engine_rocksdb.cpp.o
[ 94%] Building CXX object CMakeFiles/ukv_embedded_rocksdb.dir/src/modality_docs.cpp.o
[ 96%] Building CXX object CMakeFiles/ukv_embedded_rocksdb.dir/src/modality_vectors.cpp.o
[ 96%] Building CXX object CMakeFiles/ukv_embedded_rocksdb.dir/src/modality_paths.cpp.o
[ 96%] Building CXX object CMakeFiles/ukv_embedded_rocksdb.dir/src/modality_graph.cpp.o
CMake Error at cmake_modules/ThirdpartyToolchain.cmake:3853 (set_target_properties):
The link interface of target "gRPC::grpc" contains:
OpenSSL::SSL
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
Call Stack (most recent call first):
cmake_modules/ThirdpartyToolchain.cmake:171 (build_grpc)
cmake_modules/ThirdpartyToolchain.cmake:278 (build_dependency)
cmake_modules/ThirdpartyToolchain.cmake:3915 (resolve_dependency)
CMakeLists.txt:496 (include)
CMake Error at cmake_modules/BuildUtils.cmake:283 (target_link_libraries):
Target "arrow_objlib" links to:
OpenSSL::Crypto
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
Call Stack (most recent call first):
src/arrow/CMakeLists.txt:580 (add_arrow_lib)
CMake Error at cmake_modules/BuildUtils.cmake:441 (target_link_libraries):
Target "arrow_static" links to:
OpenSSL::Crypto
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
Call Stack (most recent call first):
src/arrow/CMakeLists.txt:580 (add_arrow_lib)
CMake Error at cmake_modules/BuildUtils.cmake:283 (target_link_libraries):
Target "arrow_dataset_objlib" links to:
OpenSSL::Crypto
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
Call Stack (most recent call first):
src/arrow/dataset/CMakeLists.txt:62 (add_arrow_lib)
CMake Error at cmake_modules/BuildUtils.cmake:441 (target_link_libraries):
Target "arrow_dataset_static" links to:
OpenSSL::Crypto
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
Call Stack (most recent call first):
src/arrow/dataset/CMakeLists.txt:62 (add_arrow_lib)
CMake Error at cmake_modules/BuildUtils.cmake:283 (target_link_libraries):
Target "parquet_objlib" links to:
OpenSSL::Crypto
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
Call Stack (most recent call first):
src/parquet/CMakeLists.txt:245 (add_arrow_lib)
CMake Error at cmake_modules/BuildUtils.cmake:441 (target_link_libraries):
Target "parquet_static" links to:
OpenSSL::Crypto
but the target was not found. Possible reasons include:
* There is a typo in the target name.
* A find_package call is missing for an IMPORTED target.
* An ALIAS target is missing.
Call Stack (most recent call first):
src/parquet/CMakeLists.txt:245 (add_arrow_lib)
-- Generating done
CMake Generate step failed. Build files cannot be regenerated correctly.
make[2]: *** [CMakeFiles/Arrow-external.dir/build.make:92: _deps/arrow-stamp/Arrow-external-configure] Error 1
make[1]: *** [CMakeFiles/Makefile2:280: CMakeFiles/Arrow-external.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 96%] Linking CXX static library build/lib/libukv_embedded_rocksdb.a
[ 96%] Built target ukv_embedded_rocksdb
make: *** [Makefile:136: all] Error 2
One thing I can say for sure is that all the required dependencies are installed. Here is the output from running uname -a if that is relevant:
Linux archlinux 5.15.90-1-lts #1 SMP Tue, 24 Jan 2023 12:46:03 +0000 x86_64 GNU/Linux
@ishkhan42
@michaelgrigoryan25 You need to have openssl libs installed on your system.
Thanks for your input, @ishkhan42. OpenSSL was fully installed on my system before cmake-ing. For reference, here is the OpenSSL package in Arch package registry https://archlinux.org/packages/core/x86_64/openssl/.
➜ ~ sudo find / -name libcrypto.so
/usr/lib/libcrypto.so
/usr/lib/openssl-1.1/libcrypto.so
/usr/lib32/libcrypto.so
➜ ~ sudo find / -name libssl.so
/usr/lib/libssl.so
/usr/lib/openssl-1.1/libssl.so
/usr/lib32/libssl.so
I faced similar issue when building on cent os, the issue was resolved by building latest(3.0.7) openssl libs from source. Below are the build instructions I used Note the last 2 lines are required, you can try that, maybe that will be enough in your case
cd /usr/src && wget https://www.openssl.org/source/openssl-3.0.7.tar.gz
tar -zxf openssl-3.0.7.tar.gz && rm openssl-3.0.7.tar.gz && cd /usr/src/openssl-3.0.7
./config && make -j16 && make install
ln -s /usr/local/lib64/libssl.so.3 /usr/lib64/libssl.so.3
ln -s /usr/local/lib64/libcrypto.so.3 /usr/lib64/libcrypto.so.3
Thanks, @ishkhan42! That worked like a charm (though I had to do it from the live CD). I think having a specific version of OpenSSL should definitely be addressed in the documentation. Now we have another issue, though. The compilation fails due to an error in the code. Here is the full stack trace:
cmake \
-DUKV_BUILD_ENGINE_UMEM=1 \
-DUKV_BUILD_ENGINE_LEVELDB=1 \
-DUKV_BUILD_ENGINE_ROCKSDB=1 \
-DUKV_BUILD_TESTS=0 \
-DUKV_BUILD_BENCHMARKS=0 \
-DUKV_BUILD_API_FLIGHT_CLIENT=1 \
-DUKV_BUILD_API_FLIGHT_SERVER=1 \
-B ./build_release && \
make -j8 -C ./build_release
........
........
........
In file included from /_/_/_/ukv/src/engine_umem.cpp:26:
/_/_/_/ukv/build_release/_deps/ucset-src/include/ucset/consistent_avl.hpp: In constructor ‘unum::ucset::avl_tree_gt<entry_at, comparator_at, node_allocator_at>::avl_tree_gt(unum::ucset::avl_tree_gt<entry_at, comparator_at, node_allocator_at>&&)’:
/_/_/_/ukv/build_release/_deps/ucset-src/include/ucset/consistent_avl.hpp:583:22: error: ‘exchange’ is not a member of ‘std’
583 | : root_(std::exchange(other.root_, nullptr)), size_(std::exchange(other.size_, 0)) {}
| ^~~~~~~~
compilation terminated due to -Wfatal-errors.
make[2]: *** [CMakeFiles/ukv_embedded_umem.dir/build.make:76: CMakeFiles/ukv_embedded_umem.dir/src/engine_umem.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:310: CMakeFiles/ukv_embedded_umem.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[100%] Linking CXX static library build/lib/libukv_flight_client.a
[100%] Built target ukv_flight_client
[100%] Linking CXX executable build/bin/ukv_flight_server_rocksdb
[100%] Linking CXX executable build/bin/ukv_flight_server_leveldb
[100%] Built target ukv_flight_server_leveldb
[100%] Built target ukv_flight_server_rocksdb
make: *** [Makefile:136: all] Error 2
If there is a commit from the main branch which fixes this, maybe we will be able to merge the changes from there.
@michaelgrigoryan25, which compiler are you using? These commits would fix your current issue:
- ucset: https://github.com/unum-cloud/ucset/commit/a6bfa275b2ff34cc72b939e16d985daa0c414de0
- ukv: https://github.com/unum-cloud/ukv/commit/54059a42cc9ea1be6ea3d4f30e155eac41a82698
Both are on on main branches.
Thanks, I will try. I used gcc (GCC) 12.2.1 20230111 to compile this project specifically.
Ok, I synced everything, removed the build cache completely via rm -rf $(cat .gitignore), and tried building again. Now, there is a new error. Here is the new stack trace:
[100%] Built target ukv_flight_client
[100%] Built target ukv_flight_server_leveldb
[100%] Built target ukv_flight_server_rocksdb
In file included from /_/_/_/ukv/src/engine_umem.cpp:28:
/_/_/_/ukv/build_release/_deps/ucset-src/include/ucset/consistent_set.hpp: In instantiation of ‘unum::ucset::status_t unum::ucset::consistent_set_gt<element_at, comparator_at, allocator_at>::upsert(element_t&&) [with element_at = pair_t; comparator_at = pair_compare_t; allocator_at = std::allocator<unsigned char>; element_t = pair_t]’:
/_/_/_/ukv/build_release/_deps/ucset-src/include/ucset/locked.hpp:133:32: required from ‘unum::ucset::status_t unum::ucset::locked_gt<collection_at, shared_mutex_at>::upsert(element_t&&) [with collection_at = unum::ucset::consistent_set_gt<pair_t, pair_compare_t>; shared_mutex_at = std::shared_mutex; element_t = pair_t]’
/_/_/_/ukv/src/engine_umem.cpp:410:42: required from here
/_/_/_/ukv/build_release/_deps/ucset-src/include/ucset/consistent_set.hpp:474:18: error: cannot convert ‘unum::ucset::consistent_set_gt<pair_t, pair_compare_t>::element_t’ {aka ‘pair_t’} to ‘bool’ in initialization
474 | bool exists = element;
| ^~~~~~
compilation terminated due to -Wfatal-errors.
make[2]: *** [CMakeFiles/ukv_embedded_umem.dir/build.make:76: CMakeFiles/ukv_embedded_umem.dir/src/engine_umem.cpp.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:310: CMakeFiles/ukv_embedded_umem.dir/all] Error 2
make: *** [Makefile:136: all] Error 2
make: Leaving directory '/_/_/_/ukv/build_release'
We haven't had such issues, at least mot lately. Can you please try with the most recent version?