rustler icon indicating copy to clipboard operation
rustler copied to clipboard

Support for encoding/decoding external structs

Open davydog187 opened this issue 5 years ago • 32 comments

First of all, thank you for all the hard work on rustler!

We have been making extensive use of Rustler as a means to run simulations and cpu bound computations from Elixir.

We keep running into the problem of wanting to write Rust only libraries, then import them from an outer crate that exposes the library to Elixir. The issue is that Rustler only supports compile time encoding/decoding, which means that we either need to write an adapter and duplicate our data structures, or find some other means.

It would be wonderful if Rustler provided a mechanism to Encode/Decode external data structures so that Rustler doesn't leak into our lower level dependencies.

We are very interested in writing this feature for Rustler, but I first wanted to see if there has been any thought on this front in the past.

Thank you!

davydog187 avatar Apr 09 '19 12:04 davydog187

I spent some time thinking about this as well, but did not yet arrive at a good solution yet. We use feature flags to work around that (disable compiling proc_macro-generated code if the crate is not to be used within a NIF), or by copying struct definitions. I wonder if serde support (#74) would help here. If rustler was able to encode/decode from serde's interface, annotating a struct in a rust-only crate with #[derive(Serialize, Deserialize)] could possibly already work.

EDIT: serde-transcode could be useful to avoid serializing into an intermediate format here.

evnu avatar Apr 09 '19 21:04 evnu

@davydog187, @evnu can one of you provide a little bit more detail into what your expectations are for this?

Is this so that you don't have to wrap external types in a new type?

scrogson avatar Apr 09 '19 21:04 scrogson

Is this so that you don't have to wrap external types in a new type?

Not exactly. I sometimes encounter the following scenario. Imagine a compute crate, which exposes a struct A. A should be usable by a command line application, and it should be exposable to Elixir. In pseudo-code:

// crate compute
pub mod compute {
    #[derive(Debug)]
    pub struct A {
        a: i32,
    }

    impl A {
        pub fn new(a: i32) -> Self {
            Self { a: a }
        }
    }
}

// crate compute_nif
pub mod compute_nif {
    pub fn compute_an_a<'a>(env: Env<'a>) -> NifResult {
        let a = compute::A::new(10);

        a.encode(env);
    }
}

// crate compute_cli
pub mod compute_cli {
    fn main() {
        let a = compute::A::new(10);

        println!("a: {}", a);
    }
}

So, it would be nice if compute_nif could make use of A to expose it with as little extra effort as possible. Wrapping it in a newtype would be no problem, of course. I believe though that this conflicts with Rust's orphan rules (see the hoops one has to jump through with serde when implementing derive for a remote type).

@davydog187 I hope that example also covers what you are asking for, I did not intend to hijack this issue.

EDIT: I forgot to mention: If the remote type is wrapped in a newtype, the remote type still needs to implement Encode/Decode.

evnu avatar Apr 09 '19 22:04 evnu

Let's image than I have crate_elixir and crate_rust. crate_rust doesn't want to have rustler as a dependency, but it has some data structure that we may way to ultimately use in crate_elixir

/// crate_rust
pub struct A {
  val: String
}

Now, if I want to use A in crate_elixir, I would have to re-encode A to a new type that has #[derive(NifStruct)], which potentially adds a lot of overhead. It would be awesome if there were a runtime way of encoding/decoding structs. Or even some trait that could be implemented

davydog187 avatar Apr 09 '19 22:04 davydog187

@evnu I think we're on the exact same page here 😆

davydog187 avatar Apr 09 '19 22:04 davydog187

This also brings up another issue, as far as I can tell, tuple structs and newtypes are not encodeable/decodeable

davydog187 avatar Apr 09 '19 22:04 davydog187

I see. Yeah, rust's orphan rules make this pretty interesting. I'm not sure exactly how to go about supporting this. But if serde was able to get around it, maybe we can as well...

scrogson avatar Apr 09 '19 22:04 scrogson

Ah, I was not aware serde was doing it that way. Don't really like it, but it solves the problem at least.

So as far as I can see there are two alternatives here:

  1. Support encoding and decoding with serdes traits
  2. Implement the same mechanism as serde does for remote structs
  3. Both?

Any opinions?

hansihe avatar Apr 09 '19 22:04 hansihe

It seems like if we supported encoding/decoding with serde's traits, we would still have the issue of not all types deriving Deserialize and Serialize.

Does that sound right?

If so, we would probably need to support both?

scrogson avatar Apr 09 '19 23:04 scrogson

If we supported the serde traits, it would enable the user to use serde's mechanism for remote structs.

But again, that's not completely optimal either. We should probably support something similar to what they are doing in our macros.

hansihe avatar Apr 09 '19 23:04 hansihe

But again, that's not completely optimal either. We should probably support something similar to what they are doing in our macros.

Do have something in mind for that? As far as I understand, it is not possible to retrieve the AST for a remote type. Then, the remote struct definition must be explicitly copied which again results in code duplication.

evnu avatar Apr 10 '19 11:04 evnu

@evnu I would love to avoid having to do that, but as far as I understand there really isn't any alternative? I suspect serde would have used a better solution if it existed

hansihe avatar Apr 10 '19 11:04 hansihe

Is the idea here to take a Rust struct and serialize it into erlang term format. Then use :erlang.binary_to_term/1 to decode the Erlang runtime struct? or to define a struct in Rust and have the Rust struct's definition end up in Elixir? or both? or neither?

elbow-jason avatar Apr 17 '19 05:04 elbow-jason

@elbow-jason The idea was to encode/decode the serde traits directly from the in memory term representation and back, much like we do currently with our own encoder/decoder traits.

This should hopefully have less of an overhead than going through the binary format.

hansihe avatar Apr 17 '19 12:04 hansihe

@hansihe wrote a library pertaining to the issue: serde_rustler provides a Serializer and Deserializer for directly encoding/decoding Rust types to/from Elixir terms.

Encoding and decoding benchmarks look promising (like, insanely, unbelievably promising), and given that I haven't published many Rust crates or Elixir packages, I'd greatly appreciate feedback on reconfiguring the benchmarks, improving the tests, changing the serialization/deserialization behavior, or just adding to/improving the API.

Hope y'all enjoy!

sunny-g avatar Jun 09 '19 04:06 sunny-g

@sunny-g Great work!

Seeing as this would probably be useful for quite a few people I think it would make sense to at last add it to the Readme of rustler?

I looked through it briefly, and it looked pretty good from what I could tell. I'll hopefully find some time to look through it more thoroughly soon.

hansihe avatar Jun 09 '19 15:06 hansihe

@hansihe Awesome, I'll open a PR. I'm kinda using the tests package as a playpen for testing, benchmarking and more advanced rustler usage, so if you see something amiss, please let me know!

sunny-g avatar Jun 09 '19 21:06 sunny-g

@sunny-g I stumbled across your repo the other day. Great work! Should this be folded into rustler or should it stay as a separate library?

My worry is that the fragmentation makes it less accessible

davydog187 avatar Jun 09 '19 22:06 davydog187

PR to add link to readme here: https://github.com/rusterlium/rustler/pull/224.

@davydog187 I'm not opposed to it, but I'm still in the process of debugging some performance issues and experimenting with additional features (e.g. a Value type similar to serde_json::Value for operating on raw Elixir terms, and a dedicated library for serialization between Elixir and existing serde formats). I'd reconsider once some of these things have stabilized/are finalized.

sunny-g avatar Jun 09 '19 22:06 sunny-g

@sunny-g were you able to make progress?

evnu avatar Oct 18 '19 18:10 evnu

Just to chime in here. I’d love to see serde_rustler pulled into the rustler repo/workspace. I used it recently on https://github.com/scrogson/no-way-jose and I was thoroughly impressed.

@sunny-g excellent work!

scrogson avatar Oct 20 '19 14:10 scrogson

@evnu never got around to it 😦

@scrogson Appreciate that! I've been coming around to this idea as well - To start, I think it could be done with as little as an impl Serializer for Env and impl Deserializer for Term tucked behind a serde feature flag, and might also be worth augmenting/replacing the existing NifStruct-type attribute/macros with ones that provide shorthands for module struct/record naming and derive Serialize/Deserialize - what do you think?

sunny-g avatar Oct 21 '19 03:10 sunny-g

@davydog187 @hansihe I'm not really using Elixir much anymore (my foray into serde_rustler got me hooked on Rust :man_shrugging:) and the crate is beginning to accumulate issues and PRs so I'm now very willing to deprecate my library and fold it's logic into rustler.

If someone wants to take this task, I think the easiest and most simplifying (though breaking) changes would be what I mentioned in my previous comment here, i.e. impl Serializer for Env and impl Deserializer for Term, re-using logic from serde_rustler, and optionally:

  1. getting rid of Encoder and Decoder while merging their logic into the Serializer and Deserializer and
  2. deprecating the NifStruct et al macros, instead pushing users to use serde's container, field and attribute macros.

If / however y'all want to do this is fine with me, and I can help anyone who wants to pick up the task.

sunny-g avatar Mar 05 '20 17:03 sunny-g

I started integrating this into rustler over the weekend.

scrogson avatar Mar 09 '20 12:03 scrogson

Howdy! I've ended up updating serde_rustler using my branch to point to the latest versions of Rustler. Are there still plans to roll this in here (as a feature flag, I'm guessing)?

I'm writing a library using rustler for req_snowflake, and Snowflake supports two different response types: JSON & Arrow. I'm using arrow2 Rust library for parsing these Arrow IPC Streams (they're files Snowflkae sends back as gzipped), and I'm honestly blown away by how fast the NIF is. Initial benchmarks show promising results, and we can even create Elixir types from Rust as we know the types from Arrow.

I've got little experience with Rust, just having used it to create a few small CLI apps here and there, so would love feedback on when the library is ready.

joshuataylor avatar Jun 04 '22 05:06 joshuataylor

Hi @scrogson and friends 👋🏼 I was recently in contact with @sunny-g and we briefly discussed the future of serde_rustler. Wanted to check in on the possibility of integrating it completely (if that's still on the table) or otherwise providing support to update the library.

The future is in your hands: if you still believe that integration is the best path forward, please let me know how we can help. If another path is better, let's see what we can do. 👍🏼

aj-foster avatar Sep 14 '22 19:09 aj-foster

@aj-foster Integrating serde_rustler would be nice! Would you like to prepare a pull request for vendoring it in? Would that be ok with @sunny-g ?

evnu avatar Oct 07 '22 09:10 evnu

@evnu @aj-foster works for me! Depending on the usage of the existing macros, I would still recommend what I mentioned here; otherwise @aj-foster and I can spearhead a simple PR to just merge serde_rustler into a rustler subdirectory, behind a feature flag, etc.

sunny-g avatar Oct 07 '22 17:10 sunny-g

Hi @evnu, I'm happy to prepare a pull request. I'm curious how you would like to approach this from an ergonomic perspective. We can take a relatively light approach (adding the package here, still segmented from the main code), or a more integrated approach like Sunny mentioned above.

Please feel free to spell out what you're imagining like I'm a small child — I'll need guidance throughout this process. 🙂

aj-foster avatar Oct 07 '22 17:10 aj-foster

@aj-foster good question on the steps. Lets keep it simple for now and start with vendoring serde_rustler into the repository. That also means checking the links in serde_rustler to ensure that they will point to the correct documentation.

Afterwards, we will need to discuss how to integrate serde_rustler further. I am not sure if we want to get rid of Encoder/Decoder completely, as that would mean that we need to basically merge serde_rustler into rustler (IIUC), requiring users to dependent on serde even if they don't actually want to use serde. But maybe we can delay that decision for now, and tackle rustler_codegen as the second step: Can we replace that completely with serde_rustler (especially the new NifEnum decorator), and is it as efficient (zero-cost conversion) as rustler_codegen? If so, lets see how we can deprecate rustler_codegen.

So, I propose these steps:

  1. Vendor serde_rustler.
  2. Determine if we can replace all functionality of rustler_codegen. If so, deprecate rustler_codegen.
  3. Determine if we need to integrate parts of serde_rustler deeper into the rustler crate itself.

Does that make sense? CC @rusterlium/core

evnu avatar Oct 08 '22 19:10 evnu