duckdb-rs icon indicating copy to clipboard operation
duckdb-rs copied to clipboard

Rust extensions based on Extension C API

Open samansmink opened this issue 1 year ago • 13 comments
trafficstars

Hi 👋

I've been working on the new C API for extensions in duckdb/duckdb (see https://github.com/duckdb/duckdb/pull/12682). TLDR: we've added a header to duckdb which contains basically a struct of function pointers with most of the C API. This struct is then passed to a loadable extension on load. This is basically the same mechanism that sqlite has.

Goals

One of the goals of this API is to be able to write clean pure-rust extensions for DuckDB that can be used across multiple duckdb versions. Basically I want an extension template like https://github.com/duckdb/extension-template, but than in pure-rust based on this new C API.

Now that that PR is merged, I would like to get started on this.

Approach

Good for me, rusqlite, the repo this one is based on, is way ahead of us and already has support for sqlite's variant of the extension api struct since https://github.com/rusqlite/rusqlite/pull/1362.

So in theory, all we need to do is copy (the parts we like) of rusqlite for this to generate the loadable extensions. Then tie everything together for the

The catch

The main catch is my Rust knowledge is pretty minimal. Therefore this will also be a bit of a rust-learning-experience for me. However, given the fact that it's mostly a copy-and-paste exercise from rusqlite, this should be fine? ;)

finally, any help or advice is very welcome!

samansmink avatar Aug 08 '24 21:08 samansmink

hey hey! glad to see there's some progress on this. I'd love to get this landed. Are you looking for someone to pair with on the rust bits? Are you looking for someone to do the rust side of the implementation? I'm happy to do either one or even hop on a call and figure out the best path forward (I can allocate 1 day of time this week towards getting this landed). What would be most useful for you?

Broadly it seems like the big parts are

  • Codgen nice rust accessor functions for our version of *mut ffi::sqlite3_api_routines (looks like bindgen + some custom codegen on top)
  • extension init
  • bridging stuff between existing api and new api (they might be the same fns?)

I'm in US Central and available during daytime hours (9am to 9pm), let me know when a good time is for you over the next few days.

0xcaff avatar Aug 14 '24 18:08 0xcaff

Hey @0xcaff, That's great to hear! Lets setup a quick call I would love to get your take on this. I will drop you an email.

I have also fiddled with it and I have a (very messy) PoC working here: https://github.com/samansmink/duckdb-rs/tree/poc-rust-c-extension-api. I think it looks pretty similar to your direction, but I also added the attribute macro for the entrypoint, handled the version properly and added a demo function to the examples which uses the new entrypoint. I actually managed to load the extension from DuckDB too! :)

So basically the end-to-end is there, now its a matter of thinking about how to get things integrated cleanly. One of the things that I find a little awkard about the way things are setup in my feature branch now is that the libduckdbsys package is now basically fundamentally different based on this loadable_extension feature, making it not really a feature but almost transforming it completely into a different package.

samansmink avatar Aug 15 '24 08:08 samansmink

Nice! It looks like my changes are pretty similar to yours. I think the open questions from me are

  • Versioning. It seems like you allow specifying a min-version in your proc macro but it seems unused.
  • Bundled build. Neither of our implementations support using a bundled version of the bindings.
  • Deprecated functions. Why are deprecated functions which are still exposed in the C ABI (?) not exposed in the new API?
  • Why does rustqlite use a bunch of atomic values instead of a single one for the entire API struct?

As for replacing all implementations of functions in the package, I don't think its a big deal as if you want both versions to co-exist you could import the crate twice with different features and also because all the stuff in the -sys crate is only wrappers around the C ABI. We can talk more about this on Friday.

0xcaff avatar Aug 15 '24 21:08 0xcaff

Notes from our call

  • The new C API allows for specifying a version here. https://github.com/samansmink/duckdb-rs/blob/eeca07d8f7ba6c6b9a0a10f4c92c3f87e319487f/crates/libduckdb-sys/build.rs#L426 Extensions declare a minimum supported duckdb API version and duckdb returns a compatible API struct.

  • Bundled build. This needs to be figured out.

  • Deprecated functions. Sam plans to consider adding them back as it looks like the deprecated functions are used pretty deeply in the rust bindings currently and seems like its not trivial to update them or flag them out.

  • Atomics. Concluded fine either way (atomics or using mut static) as we couldn't come up with usages where the atomics protect you from anything during the happy path (considered duckdb plugin init being called multiple times (it is not), calling code pre-entrypoint (you're on your own in this case)). My vote is for the mut static as this is an implementation detail. I think the rustsqlite ppl were trying to avoid unsafe even in their implementation details (which imo is not worth the runtime overhead).

Thanks for taking the time to bring me up to speed Sam!

I'll take on getting the bundled build working.

0xcaff avatar Aug 16 '24 16:08 0xcaff

Hi! I'm looking to start an extension in rust using the c API. I'm available to help with anything to move this along!

parkerdgabel avatar Aug 18 '24 03:08 parkerdgabel

@samansmink I made a PR into your repo with bundled build working. Tagging it here in case it got lost https://github.com/samansmink/duckdb-rs/pull/1

Can't wait to get this all landed when you're back from holiday!

0xcaff avatar Aug 21 '24 02:08 0xcaff

that's great, @parkerdgabel! I will ping you for a review once the PR lands, some more eyes on this is definitely helpful

samansmink avatar Aug 27 '24 15:08 samansmink

Saw you merged the work in progress PR into your branch! Let me know if you are finding issues with it, happy to help.

0xcaff avatar Aug 27 '24 16:08 0xcaff

Great @samansmink ! I have the beginning of my extension here if you would like a repo to test on. I'm not totally sure how to compile for consumption by duckdb. I was working off your poc branch. I’d love to help get this extension template working

parkerdgabel avatar Aug 28 '24 04:08 parkerdgabel

I've setup a usage of the c extensions API in my project (vendored duckdb-rs) and I found a few things we missed.

  1. https://github.com/0xcaff/duckdb_protobuf/blob/1a2bb08d445daa5d2400301351a7c89cc8c37fe0/patches/libduckdb-sys%2B1.0.0.patch#L102-L114 If buildtime_bindgen is disabled, and loadable_extension, we gotta switch here. We missed this in both our POCs. Only discovered when disabling bundling.
  2. I added handling for the minimum_duckdb_version value here https://github.com/0xcaff/duckdb_protobuf/blob/1a2bb08d445daa5d2400301351a7c89cc8c37fe0/patches/duckdb-loadable-macros%2B0.1.2.patch#L53-L58 https://github.com/0xcaff/duckdb_protobuf/blob/1a2bb08d445daa5d2400301351a7c89cc8c37fe0/patches/duckdb-loadable-macros%2B0.1.2.patch#L78C71-L78C101

There's still a bunch of rough edges.

  • There are so many features I'm not even sure what they all are. Bundled, buildtime bindgen, loadable. I think there's 2^3 valid feature sets now. I think there's room to compress them now somehow?
  • The builtin bindgen headers are a pain to update (used when buildtime bindgen is off). I turned off tests and generated them manually. It seems like there are some failing when loadable is enabled.

What happens to old extensions following the ABI of the new extensions? It seems like they are all going to break? This seems suboptimal, is this really what you want?

D LOAD '/Users/martin/projects/duckdb_protobuf/target/release/protobuf.duckdb_extension';
Not implemented Error: Enum value: '' not implemented

It is because of the code here trying to check the abi

https://github.com/duckdb/duckdb/blob/0bc27aba2cfd2fb3dd3637b4e4d28f2d6c34db74/src/main/extension/extension_load.cpp#L197

0xcaff avatar Sep 05 '24 04:09 0xcaff

here's what i did to get stuff working in my project btw https://github.com/0xcaff/duckdb_protobuf/pull/17

0xcaff avatar Sep 05 '24 22:09 0xcaff

A bit lateral, but the mentioned problem where error message would be cryptic on extension mismatch have been improved in the main of duckdb, to become v1.1.1 by https://github.com/duckdb/duckdb/pull/13894.

carlopi avatar Sep 13 '24 11:09 carlopi

Btw I forked the project and started my extension. It is working for me. I also added Scalar function support to the library. Here is my extension using the fork https://github.com/parkerdgabel/quackML Here is a short video of logistic regression now running in pure rust through duckdb. https://github.com/user-attachments/assets/ccc1e33a-0ae6-4c0b-9d84-38913754991d

I've continued to work on the fork in the meantime but can still help with this.

As an aside, do any of you know how I can write a Vec to a flatbuffer of list type?

parkerdgabel avatar Sep 15 '24 04:09 parkerdgabel

(experimental) Template is now up here: https://github.com/duckdb/extension-template-rs

With that, this first issue can probably be closed. Please feel free to open new issues for further improvements!

samansmink avatar Nov 08 '24 11:11 samansmink

Somewhat related. I created a PR to support scalar UDFs based on patterns from this PR.

matthewgapp avatar Dec 01 '24 17:12 matthewgapp