gloo icon indicating copy to clipboard operation
gloo copied to clipboard

Proposal: gloo-properties - mid-level API for getting and setting properties

Open samcday opened this issue 5 years ago • 49 comments

Summary

A mid-level API to set and query properties on arbitrary Javascript values.

Motivation

The js-sys crate already provides Reflect to get and set properties on JsValue types, and wasm-bindgen provides Serde integration that allows to easily map an existing Rust type into Javascript land. So why do we need more?

The use case we're addressing here is ad-hoc property access. This occurs often when writing Rust WASM code that interfaces with existing userland Javascript APIs. It's expected that as the Rust WASM ecosystem matures, this situation will decline in frequency. However that kind of shift is not going to happen overnight, and in the meantime we should provide better ergonomics for users that need to interface with raw Javascript values.

Detailed Explanation

gloo_properties::property

Used to get/set a single property on an existing JsValue.

API:

/// Retrieves a single Property on target Object. This Property can be used to
/// get and set the value.
pub fn property<K>(target: &JsValue, key: K) -> Property
    where K: Into<JsValue> { /* ... */ }

struct Property { /* ... */ }
impl Property {
    /// Retrieves the value of this Property as a String. If the property is
    /// undefined, null, or not a String, an Err is returned.
    pub fn string(&self) -> Result<String, JsValue> { /* ... */ }

    /// Retrieves the value of this Property as a u32. If the property is
    /// undefined, null, not a Number, or overflows a u32, an Err is returned.
    pub fn u32(&self) -> Result<u32, JsValue> { /* ... */ }
    // And so on...
    pub fn u16(&self) -> Result<u16, JsValue> { /* ... */ }
    // pub fn u8...
    // pub fn i32...
    // pub fn i16...
    // pub fn i8...

    /// Retrieves the value of this Property as a bool. If the property is
    /// undefined, null, or not a Boolean, an Err is returned.
    pub fn bool(&self) -> Result<bool, JsValue> { /* ... */ }

    /// Retrieves the value of this Property as any kind of Javascript mapped
    /// type. Anything that is exported from #[wasm_bindgen] is usable. If the
    /// property is undefined, null, or an incorrect type, an Err is returned.
    pub fn cast<T: JsCast>(&self) -> Result<T, JsValue> { /* ... */ }

    /// Retrieves a sub-property from this Property. This is used to traverse
    /// arbitrary nested object structures. Note that if a property does not
    /// exist, a Property will still be returned, but any attempts to get or set
    /// a value will return an Error.
    pub fn property<K: Into<JsValue>>(self, key: K) -> Property { /* ... */ }

    /// Sets the given value on the Property. Returns an error if the set failed
    /// for any reason. e.g frozen object, error in a setter on the Object, etc.
    pub fn set<V: Into<JsValue>>(&self, val: V) -> Result<bool, JsValue> { /* ... */ }
}

Usage examples:

use gloo_properties::property;

// Because key is generic over Into<JsValue>, using it is very convenient:
property(&obj, "foo");           // static string literals
property(&obj, &existing_str);   // existing String / &strs
property(&obj, existing_str);    // Pass by ref not required.

// Symbols work too.
let sym = JsValue::symbol("secret");
property(&obj, &sym);

// Nested property access:
property(&obj, "foo").property("bar").property("quux");

// Retrieving values.
property(&obj, "foo").string().unwrap_throw();
let my_element: web_sys::Element = property(&obj, "foo").cast();

// Traversing multiple levels never fails until trying to get/set something.
property(&obj, "i").property("don't").property("exist"); // -> Property
property(&obj, "exists").property("nope").string() // -> Err("Property 'i' does not exist")

// Setter is generic over anything that can convert into a JsValue, which is
// all core Rust types, and all web_sys types.
property(&obj, "foo").set("a direct string");
property(&obj, "foo").set(42u32);
property(&obj, "foo").set(true);

let an_existing_object = Object::new();
property(&obj, "foo").set(an_existing_object);

gloo_properties::setter

Used to quickly build up a set of properties on a value.

API:

/// Returns a Setter that can be used to set multiple properties on the given
/// target. Ownership of the target is taken until Setter#finish() is called.
pub fn setter<T: JsCast>(target: T) -> Setter<T> { /* ... */ }

pub struct Setter<T: JsCast> { /* ... */ }
impl<T: JsCast> Setter<T> {
    /// Sets given key/value pair on the target object.
    pub fn set<K, V>(self, key: K, val: V) -> Self
    where
        K: Into<JsValue>,
        V: Into<JsValue>,
    { /* ... */ }

    /// Sets given key/value pair on the target object, and then calls the given
    /// closure to set sub properties on the provided value.
    pub fn with<K, V, F>(self, key: K, val: V, func: F) -> Self
    where
        K: Into<JsValue>,
        V: JsCast,
        F: FnOnce(Setter<V>) -> Setter<V>,
    { /* ... */ }

    /// Finishes setting properties on target object and returns it.
    pub fn finish() -> T { /* ... */ }
}

Usage:

use gloo_properties::setter;

let my_sym = JsValue::symbol("secret");

let my_obj = setter(Object::new())
    .set("foo", "bar")
    .set(&my_sym, "quux")
    .set(123, 321)
    .set("bool", false)
    .set("jsval", Object::new())
    .with("nested", Object::new(), |sub| sub
        .set("and", "so")
        .set("on", "it goes"))
    .finish();

Drawbacks, Rationale, and Alternatives

Drawbacks

Currently, working with Reflect (which this crate would use) is slower than working with duck-typed interfaces, and in many cases is also slower than the Serde integration.

Alternatives

The obvious alternatives are to use duck-typed interfaces or the Serde integration. Both of these approaches require one to write and annotate a struct with the desired fields. The proposed API is suitable for use when the user does not wish to define explicit types in order to grab properties out of an existing value in an ad-hoc manner.

For example:

// Instead of needing to do this:
{
    #[wasm_bindgen]
    extern "C" {
        type MyThing;

        #[wasm_bindgen(method, getter, structural)]
        fn foo(this: &MyThing) -> String;
    }
    let thing: MyThing = my_obj.dyn_into::<MyThing>().unwrap_throw();
    thing.foo
}

// Or this:
{
    #[derive(Deserialize)]
    struct MyThing {
        foo: String,
    }
    let thing: MyThing = JsValue::into_serde(&my_obj).unwrap_throw();
    thing.foo
}

// One can just do this:
{
    property(&my_obj, "foo").string().unwrap_throw();
}

Unresolved Questions

Currently, none.

samcday avatar Mar 27 '19 17:03 samcday

Relatedly, we should have convenient APIs for looking up keys on existing objects (since that's actually a really common thing to do, and it's quite nasty right now!)

Pauan avatar Mar 27 '19 18:03 Pauan

Thanks for writing this up!

Questions off the top of my head:

  • How would this interface support symbols and i32 property keys?

  • Whenever I've needed to work with raw JS objects, I've generally found it easier to use duck-typed interfaces (for example: the definition here and the usage here) -- what utility would this interface provide over using duck-typed interfaces?

  • ObjectProps getters -- are these unwraping the type conversions and panicking on failure?

fitzgen avatar Mar 27 '19 18:03 fitzgen

Here's an idea for a possibly better API:

Create a trait IntoJs for converting an object to a JsValue, which is implemented for strings, numbers, bool, char, JsValue, Object and maybe more.

Then lets create a function

pub fn set_property<K: IntoJs, V: IntoJs>(object: &Object, key: K, value: V) {
    Reflect::set(object, key.into_js(), value.into_js()).unwrap_throw();
}

and a macro using this function that can be used like this:

let my_obj = set_props! { Object::new(), with
    "foo"   -> "bar",
    "quux"  -> 123u8,
    "b0rk"  -> 123.1,
    "meep"  -> any_old_JsValue,
    "blerg" -> set_props!(Object::new(), with "potato" -> "spaghetti"),
};

This can also be used to update to an existing object:

set_props! { my_obj, with
    "foo"   -> 123u8,
    "quux"  -> "bar,
};

Aloso avatar Mar 27 '19 21:03 Aloso

I tried it out, and it works with a recursive macro_rules! macro.

Show code
pub trait IntoJs {
    fn into_js(&self) -> JsValue;
}

pub fn set_property<K: IntoJs, V: IntoJs>(object: &Object, key: K, value: V) {
    Reflect::set(object, &key.into_js(), &value.into_js()).unwrap_throw();
}

#[macro_export]
macro_rules! set_props {
    ( $obj:expr, with $($props:tt)* ) => {
        {
            let obj = $obj;
            set_props_internal!(obj; $($props)*);
            obj
        }
    };
}
macro_rules! set_props_internal {
    ( $obj:ident; $key:literal -> $val:expr, $($rest:tt)* ) => {
        set_property(&$obj, $key, $val);
        set_props_internal!($obj; $($rest)*);
    };
    ( $obj:ident; $key:literal -> $val:expr ) => {
        set_property(&$obj, $key, $val);
    };
    () => {};
}

impl IntoJs for &str {
    fn into_js(&self) -> JsValue {
        JsValue::from_str(self)
    }
}

impl IntoJs for Object {
    fn into_js(&self) -> JsValue {
        JsValue::from(self)
    }
}

impl IntoJs for JsValue {
    fn into_js(&self) -> JsValue {
        self.clone()
    }
}

macro_rules! impl_into_js_num {
    ( $($t:ty)* ) => {
        $(
            impl IntoJs for $t {
                fn into_js(&self) -> JsValue {
                    JsValue::from_f64(*self as f64)
                }
            }
        )*
    }
}

impl_into_js_num!(u8 i8 u16 i16 u32 i32 u64 i64 u128 i128 f32 f64);

Aloso avatar Mar 27 '19 23:03 Aloso

@Pauan yeah, absolutely. The initial ideas I have around ObjectProps are somewhat nascent, but as you've rightly pointed out, querying data out of ad-hoc objects from Rust-side is a little verbose right now.

@fitzgen very good points, let me address each one.

  • I'm not really a Rust guru so I don't know if there's some super idiomatic Rust-y way to accomplish handling Symbols / i32 keys in the main API I already sketched out. I believe (though I don't have any hard numbers or telemetry to back this up) that the vast majority of use cases in dealing with JS-land objects are gonna be dealing with standard String based property names. We could also just have a complete copy of the API I outlined with _sym and _i32 suffixes. Open to ideas here!
  • The purpose of ObjectBuilder is really to handle the ad-hoc object building case, for example when talking to existing JS libraries/userland code. i.e situations where you're only going to build that kind of object in one place. If we take my original ObjectBuilder example again, here's how it would look with duck typed interfaces:
      extern "C" {
        #[derive(Debug, Clone)]
        pub type Bacon;
    
        #[wasm_bindgen(structural, method, setter)]
        fn set_foo(this: &Bacon, val: &str);
    
        #[wasm_bindgen(structural, method, setter)]
        fn set_quux(this: &Bacon, val: u64);
    
        #[wasm_bindgen(structural, method, setter)]
        fn set_bleep(this: &Bacon, val: i64);
    
        #[wasm_bindgen(structural, method, setter)]
        fn set_b0rk(this: &Bacon, val: f64);
    
        #[wasm_bindgen(structural, method, setter)]
        fn set_meep(this: &Bacon, val: JsValue);
    
        #[wasm_bindgen(structural, method, setter)]
        fn set_blerg(this: &Bacon, val: &SubBacon);
    
        #[derive(Debug, Clone)]
        pub type SubBacon;
    
        #[wasm_bindgen(structural, method, setter)]
        fn set_potato(this: &Bacon, val: &str);
      }
    
      let my_obj: Bacon = Object::new().unchecked_into::<Bacon>();
      my_obj.set_foo("bar");
      my_obj.set_quux(123u8);
      my_obj.set_bleep(321i16);
      my_obj.set_b0rk(123.1);
    
      let my_sub_obj: SubBacon = Object::new().unchecked_into::<SubBacon>();
      my_sub_obj.set_potato("spaghetti");
    
      my_obj.set_blerg(my_sub_obj);
    
    As you can see, there's still quite a lot of verbosity here. If we're only ever building this object once to send it out to some Javascript API somewhere, it feels redundant to have built all this typing information manually. I think the duck typing interface is the way to go when you're dealing with incoming parameters FROM the JS side. In that case you might have clients passing in an object that has properties + callback methods and you want a nice interface over all of that.
  • Yes, the idea behind ObjectProps would be to immediately unwrap_throw. I think that this addresses a majority of use-cases in when you'd even be accessing properties from a Javacript object in the first place. Once you're in WASM land, you're dealing with Rust types and all the richness that the Rust type system has to offer. It's only when you're receiving ad-hoc objects from the Javascript side that you need to roll up your sleeves and do the messy type coercions. In those situations, I'm making the case that 9 times out of 10 you're not going to want to do anything more than simply unwrap_throw anyway, since if the JS side passed in a bad type for a property, you don't usually want to proceed any further. In the situations where you do have the ability to work around some odd shape of data passed in, then you can still easily drop down to the lower level API available in js_sys/wasm_bindgen. That said, I haven't thought through ObjectProps as much, I'm sure there's a lot more useful stuff that it could be doing. For example we could have variants of the API that do some basic coercion that follows Javascript rules. i.e asking for a string prop will .toString() whatever properties does exist, so you'll get a string representation of Number/bools, etc.

@Aloso that's very cool! That approach with a macro would probably also provide a cleaner way to address @fitzgen's question regarding support for Symbol and i32 keys right? Rust macros still kinda scare me when trying to conceptualize what they can do and what they can't. For example would macros allow us to do something like this?

let my_obj = set_props! { Object::new(), with
    "foo"   -> "bar",
    Symbol("spaghetti") -> "potato",
    42 -> "meaning",
};

I'm also curious to see what others think about macros vs. a more vanilla API comprised of functions. For me personally, I don't reach for macros unless they allow me to express something that is otherwise very difficult or verbose to express in normal Rust syntax.

I don't think we need all those variants of impl_into_js_num, Into<u64> + Into<i64> variants should be all that's necessary (since all lower int primitives implement Into<u64>):

fn int<T: Into<i64>>(val: T) -> JsValue {
    JsValue::from_f64(val.into() as f64)
}
fn uint<T: Into<u64>>(val: T) -> JsValue {
    JsValue::from_f64(val.into() as f64)
}
fn float<T: Into<f64>>(val: T) -> JsValue {
    JsValue::from_f64(val.into())
}

// These all work:
uint(123u8);
uint(321u16);
int(666i32);
float(3.14159f32);

samcday avatar Mar 28 '19 08:03 samcday

@Aloso another great point you had is being able to augment properties onto existing Objects. If we opted to not go for the macro approach and stick with the existing ObjectBuilder then that's pretty easy to handle:

  let existing_obj = ObjectBuilder::new().build();
  let same_obj = ObjectBuilder::from(existing_obj).string("foo", "bar").build();

samcday avatar Mar 28 '19 08:03 samcday

@samcday

For example would macros allow us to do something like this?

Yes. macro_rules! doesn't allow the -> token after a expr, so I used a literal on the left. This doesn't work for symbols, but we could also allow ident, Symbol(literal) and Symbol(ident), which would cover most use cases.

Aloso avatar Mar 28 '19 09:03 Aloso

Yes, the idea behind ObjectProps would be to immediately unwrap_throw.

I think there's a pretty simple solution: the build() method would return Result, and then a new unwrap_build() method is added which calls unwrap_throw():

#[inline]
fn unwrap_build(self) -> Object {
    self.build().unwrap_throw()
}

Rust macros still kinda scare me when trying to conceptualize what they can do and what they can't.

Rust macros can do pretty much anything, since they just operate on a token stream.

There's a few limitations, which you can read about here.

For the most part I don't worry about it: I just make whatever syntax seems reasonable, and then afterwards fix up any compile errors that Rust gives me.

I don't think we need all those variants of impl_into_js_num, Into<u64> + Into<i64> variants should be all that's necessary

i64, u64, i128, and u128 cannot (currently) be supported (they cannot fit into an f64).

For performance reasons, it's probably best to handle each type separately (rather than using Into)

Pauan avatar Mar 28 '19 12:03 Pauan

I think there's a pretty simple solution: the build() method would return Result, and then a new unwrap_build() method is added which calls unwrap_throw():

Are you getting ObjectProps mixed up with ObjectBuilder? I'm not sure if there's any need to force users to handle exception cases while building an Object with ObjectBuilder, because the API is strongly typed and is essentially mapping Rust stuff into JS-land, which can handle anything.

The only edge-case I can think of is if one tried to add some properties onto an existing Object that was frozen/sealed, or had custom setters for property names you were trying to set. Those cases might be enough justification to not support feeding a pre-existing Objects into ObjectBuilder if we want to keep its API lean and focused.

i64, u64, i128, and u128 cannot (currently) be supported (they cannot fit into an f64).

Ah yes good point. I mean in that case we can lower it down to Into<u32|i32> instead right?

For performance reasons, it's probably best to handle each type separately (rather than using Into)

Hmm, do you have more background on this concern? I'm not sure that widening an int is a particularly expensive operation, and it has to happen anyway since the only number primitive that Javascript supports is f64 anyway right?

samcday avatar Mar 28 '19 13:03 samcday

When you want to avoid the hop of encoding a Rust type to JSON and then deserializing it again. That said, this might actually be faster for complex object graphs, depending on how much overhead there is in the ping-ponging between WASM and JS land using ObjectBuilder.

Note that this will be fixed and is tracked separately in https://github.com/rustwasm/wasm-bindgen/issues/1258.

I'm currently working on an [for now internal] solution for it, got it to the feature-complete stage, and now investigating performance.

So far it looks like, even with pre-encoding object keys and so on, on some inputs it can be much slower than serde-json due to the overhead of frequent crossing of boundary and encoding lots of small strings as opposed to a single large one, but on other inputs involving more numbers than strings it can be much faster. But yes, hopefully:

ObjectBuilder would always be faster once host bindings land

Either way, I would bet on that existing serde integration rather than try and introduce yet another API for objects.

RReverser avatar Mar 28 '19 14:03 RReverser

Thanks for the feedback @RReverser - what you're working on sounds very interesting and I'll be sure to check it out when you put some code up

Either way, I would bet on that existing serde integration rather than try and introduce yet another API for objects.

You might be right, although your feedback only touched on one of the points I raised. What are your thoughts on these other situations?

  • When you care about object identity (i.e ===)
  • When you're dealing with ArrayBuffers

samcday avatar Mar 28 '19 14:03 samcday

When you're dealing with ArrayBuffers

This is also covered by proper serde integration - it has dedicated methods for serializing / deserializing bytes (usually via https://github.com/serde-rs/bytes) and I'm already leveraging that for creating and passing back Uint8Array.

When you care about object identity

This one seems pretty rare usecase and indeed harder to solve... I guess it could be possible to implement that somehow with {de}serialize_with and custom checks down the road, but feels a bit niche for now.

RReverser avatar Mar 28 '19 15:03 RReverser

Of course performance is an important aspect here, but when I think about this more I think the main purpose of ObjectBuilder would be to provide better ergonomics for the ad-hoc object building case. As discussed previously, if one is building an object for JS-land in multiple places or is expecting to receive a very particular shape of object from JS-land, then the existing tools (structural tags / JsCast / JsValue::into_serde) make more sense.

The question is whether the situation I describe is considered common enough to warrant the API surface area in Gloo. I would make the case that it is - which is why I raised this RFC :). i.e if you're surgically inserting Rust WASM into parts of an existing codebase, you'll be dealing with all sorts of ad-hoc objects everywhere. needing to define structs in Rust land for each kind of ad-hoc object (so that it can either be tagged with structural tags or fed into the Serde integration) is the sticking point for me here.

Concrete example that I'm dealing with right now in my own project: passing messages from a Web Worker written in Rust. There's a bunch of messages that get exchanged between the main UI and the Rust Web Worker. Each message is different but only shows up once in the message handling code, at which point I scoop all the relevant data out and call into proper Rust methods.

Using Serde or duck typed interfaces looks like this:

struct MessageA {
  blah: String
}
struct MessageB {
  foo: u32
}
//..
struct MessageZ {
  bar: bool,
}

fn send_message_a() {
  let msg = MessageA{blah: "blorp"};
  worker().postmessage(&JsValue::from_serde(msg));
}

fn send_message_b() {} // and so on

Whereas with the proposed ObjectBuilder you're just doing this:

fn send_message_a() {
  let msg = ObjectBuilder::new().string("blah", "blorp").build();
  worker().postmessage(&msg);
}
fn send_message_b() {} // and so on

samcday avatar Mar 28 '19 15:03 samcday

I have a habit of being kinda verbose when I'm trying to get a point across. Let me try and distill it into a much easier TL;DR:

I think that requiring users to crystallize every ad-hoc Object they need to build for interop between Rust and JS into a struct is not user-friendly. I think the situations where one needs to do such a thing is frequent enough to warrant adding something like ObjectBuilder to the mix alongside the existing options.

I'd like to know what others think!

samcday avatar Mar 28 '19 15:03 samcday

I see. It sounds like your main concern is the boilerplate associated with defining custom types for one-off objects, even if objects themselves are still static. If so, you should be able to hide implementation details in a macro like:

macro_rules! object {
  ($($prop:ident: $value:expr),* $(,)*) => {{
    #[derive(Serialize)]
    pub struct Object<$($prop,)*> {
      $($prop: $prop,)*
    }
    
    JsValue::from_serde(&Object {
        $($prop: $value,)*
    })
  }};
}

and then reuse it wherever you need ad-hoc objects like

fn send_message_a() {
  let msg = object! { blah: "blorp" };
  worker().postmessage(&msg);
}

This way you should get the best of both worlds (static serialization + no need to write out types) with relatively low efforts. What do you think?

RReverser avatar Mar 28 '19 15:03 RReverser

i64, u64, i128, and u128 cannot (currently) be supported (they cannot fit into an f64).

All Rust integers fit into a f64. The biggest finite f64 is ≃ 1.7976931348623157 × 10308, whereas the biggest u128 is only ≃ 3.402823669209385 × 1038. Although casting a i64 to f64 involves rounding, I see no reason to forbid that.

@RReverser

When you're dealing with ArrayBuffers

This is also covered by proper serde integration - it has dedicated methods for serializing / deserializing bytes

I thought that objects such as ArrayBuffers can be passed between JS and Rust, without (de)serializing at all? I understand that strings need to be converted (UTF-8 <=> UTF-16), but not ArrayBuffers. So I'm guessing that (de)serializing an object containing a large ArrayBuffer is always slower than using Reflect. Or am I understanding something wrong?

Also, how does the (de)serialization work if the JsValue contains cycles?

you should be able to hide implementation details in a macro like:

macro_rules! object { ... }

This looks really nice. The only missing feature is numeric and symbol properties, like

object! {
    "foo" : bar,
    4 : true,
    Symbol("hello") : "world",
}

And I still think that a macro (or similar API) would be handy to add or modify values of an existing JsValue, even if that requires you to call unwrap_throw(), e.g.

extend! { div, with
    "id": "my-div",
    "className": "fancy border red",
}.unwrap_throw();

Aloso avatar Mar 28 '19 22:03 Aloso

I thought that objects such as ArrayBuffers can be passed between JS and Rust, without (de)serializing at all?

Unfortunately not, you still need to copy the memory from JS to WASM or the other way around.

Also, how does the (de)serialization work if the JsValue contains cycles?

By default it doesn't, but then, it's not particularly easy to create cycles in Rust structures anyway (even with Rc you will end up leaking memory).

This looks really nice. The only missing feature is numeric and symbol properties, like

I don't think these are common cases (you pretty much never should create an object with numeric properties), but they shouldn't be hard to add where necessary.

And I still think that a macro (or similar API) would be handy to add or modify values of an existing JsValue

Any reason you can't use Object::assign(...) for that?

Object::assign(div, object! {
  id: "my-div",
  className: "fancy border red"
})

RReverser avatar Mar 29 '19 01:03 RReverser

Unfortunately not, you still need to copy the memory from JS to WASM or the other way around.

Not strictly true, see TypedArray::view. You can take any arbitrary block of memory in Rust side and cast it into a typed array. Under the hood this is taking the underlying WebAssembly.Memory that the program is running in, slicing it to the bounds of the given Rust slice, and then returning the backing ArrayBuffer view available in Memory.buffer. This is zero copy. Of course doing such a thing can be dangerous (which is why view is unsafe). But, it can be done, and I'm sure people are doing it in hot paths.

If so, you should be able to hide implementation details in a macro like:

Yeah, @Aloso suggested a macro also, and it looks similar to your suggestion. The difference is you're proposing to build a struct as part of that macro.

My question is, are you suggesting that such a macro is something people should just be using themselves when needed, or are you suggesting that it's worth codifying here in Gloo? If the former, then yeah I think we can just find a nice place in the documentation to wedge that snippet and call it a day.

If the latter, then we can keep iterating on your suggestion. Instead of using Serde under the hood, we can also annotate the generated struct fields with the structural tags and use direct setters, which will handle the object identity stuff I raised and also support zero copy references of ArrayBuffers.

samcday avatar Mar 29 '19 07:03 samcday

Of course doing such a thing can be dangerous (which is why view is unsafe). But, it can be done, and I'm sure people are doing it in hot paths.

It can't be done (safely) as part of any object serialisation, because literally the next property can cause an allocation that will invalidate the view. It's meant to be used only as a temporary short-lived reference, which makes it suitable for copying data to JS.

My question is, are you suggesting that such a macro is something people should just be using themselves when needed, or are you suggesting that it's worth codifying here in Gloo?

I don't have particularly strong opinion. In my personal experience, arbitrary objects like these are pretty infrequent compared to repetitive typed objects, so it doesn't feel like something that belongs in the core yet, but maybe could be shared as a separate crate (especially since, as you noted, there are various potential implementations which could go into different crates).

But, if most people disagree, I defer.

we can also annotate the generated struct fields with the structural tags

AFAIK these are not necessary and are the default these days, but otherwise yeah, that's a good idea too.

also support zero copy references of ArrayBuffers.

As mentioned above, this is still highly unsafe and likely won't work here.

RReverser avatar Mar 29 '19 10:03 RReverser

Okay so I wrote an entirely unscientific set of benchmarks to test the overhead of building a simple {"foo": "bar"} object. Here's the code:

lib.rs

use js_sys::*;
use serde::Serialize;
use wasm_bindgen::prelude::*;
use wasm_bindgen::JsCast;

struct ObjectBuilder {
    obj: Object,
}

impl ObjectBuilder {
    fn new() -> ObjectBuilder {
        let obj = Object::new();
        ObjectBuilder { obj }
    }

    fn string(self, prop: &str, val: &str) -> Self {
        Reflect::set(&self.obj, &JsValue::from_str(prop), &JsValue::from_str(val)).unwrap_throw();
        self
    }

    fn build(self) -> JsValue {
        self.obj.unchecked_into()
    }
}

// Called by our JS entry point to run the example.
#[wasm_bindgen]
pub fn objectbuilder(array: &Array) {
    for _ in 0..1000 {
        array.push(&ObjectBuilder::new().string("foo", "bar").build());
    }
}

#[wasm_bindgen]
pub fn noop(array: &Array) {
    if array.length() != 0 {
        panic!("preventing optimizations, I think? {:?}", array);
    }
}

#[wasm_bindgen]
pub fn empty_obj(array: &Array) {
    for _ in 0..1000 {
        array.push(&Object::new());
    }
}

#[wasm_bindgen]
pub fn serde(array: &Array) {
    #[derive(Serialize)]
    struct SerdeObject<'a> {
        foo: &'a str,
    }
    for _ in 0..1000 {
        let obj = SerdeObject { foo: "bar" };
        array.push(&JsValue::from_serde(&obj).unwrap());
    }
}

#[wasm_bindgen]
pub fn ducky(array: &Array) {
    #[wasm_bindgen]
    extern "C" {
        type DuckObject;

        #[wasm_bindgen(method, setter, structural)]
        fn set_foo(this: &DuckObject, val: &str);
    }
    for _ in 0..1000 {
        let obj = Object::new().unchecked_into::<DuckObject>();
        obj.set_foo("bar");
        array.push(&obj);
    }
}

index.js

function verifyResult(array) {
  if (array.length !== 1000) {
    throw new Error(`Invalid length: ${array.length}`);
  }
  for (const item of array) {
    if (item.foo !== 'bar') {
      throw new Error(`Bad item: ${JSON.stringify(item)}`);
    }
  }
}

function runTest(label, fn, skipVerify) {
  const results = [];
  let array = new Array(1000);

  for (let i = 0; i < 100; i++) {
    array.length = 0;
    let start = performance.now();
    fn(array);
    let end = performance.now();
    if (!skipVerify) {
      verifyResult(array);
    }
    results.push(end-start);
  }

  return {
    test: label,
    avg: results.reduce((acc, val) => acc + val, 0) / results.length,
    worst: Math.max.apply(Math, results),
    best: Math.min.apply(Math, results),
  };
}

import("../crate/pkg").then(module => {
  const results = [
    runTest('js', array => {
      for (let i = 0; i < 1000; i++) {
        const obj = new Object();
        obj.foo = "bar";
        array.push(obj);
      }
    }),

    runTest('rustland: noop', module.noop, true),
    runTest('rustland: empty_obj', module.empty_obj, true),
    runTest('rustland: objectbuilder', module.objectbuilder),
    runTest('rustland: serde', module.serde),
    runTest('rustland: ducky', module.ducky),
  ];

  console.table(results);
});

If you spot any glaring errors then please point them out. Even though this is a very coarse test, it's pretty revealing, I think. The results I get in Chrome look like this:

Test Avg Worst Best
js 0.3636000060942024 3.810000023804605 0.13499998021870852
rustland: noop 0.0024500000290572643 0.11999998241662979 0
rustland: empty_obj 0.43634999776259065 1.7300000181421638 0.3249999135732651
rustland: objectbuilder 3.8228000001981854 12.665000045672059 2.905000001192093
rustland: serde 8.599249996477738 20.55000001564622 8.019999950192869
rustland: ducky 1.7073499981779605 12.589999940246344 1.4550000196322799

There's a lot of variance between runs because I have no idea how to write a good micro benchmark for the browser, but the variance isn't so much that is takes away from the general scale of difference between the classes of tests.

Anyway, using a duck typed interface appears to be the clear choice when building small objects. ObjectBuilder dispatching through Reflect::set even once is enough to make it substantially (nearly 2x) slower than using duck typed interfaces.

My conclusions from this are:

  • Obviously JS was going to be faster because the JIT can clearly see the shape of the object we're building in the inner loop and would be doing all sorts of fun stuff to optimize that.
  • Serde is surprisingly slow for this scenario, but hopefully it gets faster with @RReverser's work.
  • Right now, building Objects from Rust side is just painfully slow and should probably be avoided at all costs in performance-sensitive hot paths. This is exactly the sort of thing Host Bindings will make more-betterer, right?
  • I'm closing out this RFC since I don't think there's any value in the ObjectBuilder proposal as compared to existing solutions

samcday avatar Mar 29 '19 10:03 samcday

Okay slight addendum. I knew something felt a little weird about the results I saw seeing. I was compiling the crate in development mode. Heh.

Results in releasemode:

Test Avg Worst Best
js 0.16065000323578715 0.47500000800937414 0.12999994214624166
rustland: noop 0.0019499973859637976 0.09999994654208422 0
rustland: empty_obj 0.1994499983265996 5.755000049248338 0.07499998901039362
rustland: objectbuilder 3.075149995274842 22.119999979622662 2.3449999280273914
rustland: serde 2.3486500151921064 12.519999989308417 1.879999996162951
rustland: ducky 1.3619999960064888 12.400000006891787 1.0900000343099236

So Serde is nowhere near as slow as I was stating in my previous comment. It's still (currently?) slower than duck-type interfaces though. And I still stand by my conclusion that right now, there's a large amount of overhead in building objects in Rust side, such that it should be avoided where possible.

samcday avatar Mar 29 '19 10:03 samcday

There's a lot of variance between runs because I have no idea how to write a good micro benchmark for the browser

The common go-to library is Benchmark.js (that's what I used for years at least, and that's what's been used on JSPerf).

Anyway, using a duck typed interface appears to be the clear choice when building small objects. ObjectBuilder dispatching through Reflect::set even once is enough to make it substantially (nearly 2x) slower than using duck typed interfaces.

In case you're interested, one of the many small cuts I found and fixed as part of serde-wasm-bindgen work is that in V8 Reflect.set is much slower than direct property setter, so I ended up defining custom import type for Object with an indexing_setter method - even that on its own made 1.5x-2x difference in some benchmarks.

This is exactly the sort of thing Host Bindings will make more-betterer, right?

Depends on your type of data. If you have to deal with lots of strings in your objects, then in such benchmarks string encoding/decoding becomes the most expensive part of the profile and there is not much you can do there, unfortunately, either with host bindings or without them.

But I agree with your conclusions - if you can define your object statically with setters, it's always going to be faster than serialisation.

Would you be willing to adapt the macro to use your approach and publish it on crates.io for others?

RReverser avatar Mar 29 '19 10:03 RReverser

The common go-to library is Benchmark.js

Ah yes of course. In the back of my mind I was thinking "I wish I could just use whatever JSPerf is using" but didn't bother to see if that was easily possible ;) Benchmarking is a fickle science for sure, but I wasn't looking for extreme accuracy here, just a reasonably apples-to-apples comparison of the different approaches. I think what I quickly whipped up does that reasonably well enough.

In case you're interested, one of the many small cuts I found and fixed as part of serde-wasm-bindgen work is that in V8 Reflect.set is much slower than direct property setter, so I ended up defining custom import type for Object with an indexing_setter method - even that on its own made 1.5x-2x difference in some benchmarks.

That's very interesting! And would probably go a long way in explaining the difference between the rustland: ducky and rustland: objectbuilder tests. I wonder if Reflect::set is so much slower because it circumvents the JITs ability to lower the foo property down to a primitive or something? I dunno, a VM engineer I am not.

Would you be willing to adapt the macro to use your approach and publish it on crates.io for others?

No, I don't think so. If it's worth putting in a crate then it's worth putting in Gloo (IMO). But I just don't think it's worth putting in a crate. Building a struct for objects, even if used once, isn't that bad.

I do wonder if there's some work that could be done in the documentation here though. Before I made this proposal I didn't know that the duck typing approach could be applied so easily to setting arbitrary object properties. I guess I didn't read the documentation well enough but maybe there's room for some kind of "cookbook" section that walk through common and simple use cases like this? :man_shrugging:

samcday avatar Mar 29 '19 10:03 samcday

@Aloso All Rust integers fit into a f64.

No, they don't. The largest integer representable by an f64 is 9007199254740991, which is 2.0.powf(53.0) - 1.0 (53 bits)

Above that number, floating points lose precision. For example, 2.0.powf(53.0) is equal to 2.0.powf(53.0) + 1.0

Floating points are complicated, but here is a good summary. The Wikipedia article is also rather good.

It is simply not possible to fit an u64 (or higher) into an f64. In the case of u128 there isn't even enough bits: f64 has 64 bits, whereas u128 requires 128 bits.

And no, silently truncating or rounding or causing other such lossiness isn't okay. That should require explicit instructions from the user, and shouldn't happen automatically.

@RReverser Depends on your type of data. If you have to deal with lots of strings in your objects, then in such benchmarks string encoding/decoding becomes the most expensive part of the profile and there is not much you can do there, unfortunately, either with host bindings or without them.

Host bindings have support for directly passing UTF-8 strings, which (hopefully) will avoid encoding/decoding.

Pauan avatar Mar 29 '19 13:03 Pauan

Host bindings have support for directly passing UTF-8 strings, which (hopefully) will avoid encoding/decoding.

Ah interesting. Need to see how it's going to become implemented, but sounds promising if they could create strings backed by WASM memory.

RReverser avatar Mar 29 '19 15:03 RReverser

No, I don't think so. If it's worth putting in a crate then it's worth putting in Gloo (IMO). But I just don't think it's worth putting in a crate. Building a struct for objects, even if used once, isn't that bad.

@samcday I've read the entire thread and I don't fully understand how you came to this conclusion. I had an instance today where I needed to create a JSValue (in the form of a JS object) in a code path that wasn't hot. Having a convenience macro would have helped me out. It wasn't terribly inconvenient, but I don't believe the purpose of gloo is to provide "not terrible APIs" - I think web_sys/js_sys mostly already accomplish this.

rylev avatar Mar 29 '19 15:03 rylev

@rylev thanks for the feedback!

I guess I'm not really that familiar with how the RFC process works just yet. My feeling was that if initial feedback was lukewarm then it's probably not something that should be pursued, especially considering there's approximately a trillion other useful things that could be added to Gloo right now. Have you got some tips on how to judge if/when an RFC should be pushed?

I'm happy to keep going with this one if there's enough appetite for it.

samcday avatar Mar 29 '19 15:03 samcday

It's probably best to have @fitzgen weigh in here. I'm just of the opinion that creating ad-hoc JavaScript objects is something users may want to do. I can see that argument that as the use of web_sys goes down (because people use the gloo wrappers more often), there might be less of a need, but for quick usage, a simple macro is less boilerplate than creating a wasm-bindgen enabled struct.

rylev avatar Mar 29 '19 15:03 rylev

I'm just of the opinion that creating ad-hoc JavaScript objects is something users may want to do.

Yeah, for sure, it's one of the first things I bumped into when I started trying to write a Web Worker purely in Rust, which is why I opened the RFC.

It's probably best to have @fitzgen weigh in here.

SGTM. If there's still appetite I'm happy to rework the RFC to propose a macro approach to easily set properties on Objects. I think the utilities to query data out of Objects would also be quite useful.

samcday avatar Mar 29 '19 15:03 samcday

I don't want to set myself up as a BDFL for Gloo — I'd really prefer this is a more collective, community-driven project :)

On that note, if y'all think it would be useful to formalize some of the design proposal acceptance rules (for example, potentially make a formal team roster with required number of +1s before considering a design "accepted"), we can spin off a new issue to discuss that. This could help avoid some of the "is this something we want or not? are we making progress here? what are the next steps?"-style limbo. But also, Gloo is a very new project and I hesitate to add more bureaucracy until we're sure we want it.


Ok, so here are my actual thoughts on this issue:

  • I still believe that using "duck-typed interfaces" is what people should be using more often than not when they ask for "better object APIs". However, there are definitely cases where that doesn't fit (e.g. there is no loose sense of interface and we really want to work with arbitrary object properties and types of values). Therefore, it does make sense to pursue making arbitrary, untyped objects easier to work with, in my opinion. (But we probably want to prominently link to the duck-typed interfaces section of the wasm-bindgen guide and have some discussion of when you would use that instead.)

  • We should avoid reaching for macros as long as we can. They shouldn't be in a v1 design. We should only use those after we've exhausted how nice we can make the APIs without macros.

  • IntoJs seems like it is just Into<JsValue>, and I'm not convinced we need to define a new trait here.

  • Hiding type-mismatch panics in property getters will lead to fragile code. If the user is so sure about the types, they should probably be using duck-typed interfaces instead. At minimum the default should be to return results/options and have an unwrap_blah version that very clearly might panic.

  • I think we could get 99% of the utility that's on the table by providing:

    • An object builder that uses some nice sprinklings of generics for defining properties.

    • Some sort of getters and setters for existing object with a nice sprinking of generics (maybe extension traits for js_sys::Object; maybe a wrapper type)

    And then after that is implemented and played around with a bit, I'd look at what else might be done to improve ergonomics (tweaking these APIs, adding more APIs, considering macros).

fitzgen avatar Mar 29 '19 17:03 fitzgen