html icon indicating copy to clipboard operation
html copied to clipboard

Allow objects to customize serialization / deserialization for structured clone

Open jasnell opened this issue 3 years ago • 29 comments

In Node.js we have implemented a Node.js specific ability for various built-in objects to provide their own serialization/deserialization for cloning or transfer. The mechanism work by placing a platform host object into the JavaScript objects prototype chain then attaching specific symbols to the JavaScript object that implement the serialize and deserialize functions.

For instance,

const {
  JSTransferable,
  kClone,
  kDeserialize
} = require('...');

class Foo extends JSTransferable {
  // ...

  [kClone]() {
    return {
      data: { a: 1, b: 2 },
      deserializationInfo: '{module specifier}:Foo'
    };
  }

  [kDeserialize]({ a, b }) {
    // ...
  }
}

The JSTransferable here is a host object implemented in C++. Our value serializer delegate understands that to clone any object that extends from JSTransferable, it simply needs to look for it's [kClone] method and serializes the returned data in the object's place.

The value deserializer delegate is a bit trickier. Currently, only Node.js core objects can extend from JSTransferable because the deserializer has to be sure it can locate and resolve the definition of the Foo class in order to create it and deserialize it properly. Essentially the process is: use the deserializationInfo to resolve the class, then once resolved, pass the data in to the kDeserialize method.

The mechanism works well but is definitely limited because it (a) only works in Node.js and (b) only works with Node.js core objects. We'd like to be able to extend the capability to user defined objects.

There are three key pieces here that would need to be standardized:

  1. The symbols that are to be attached to be objects. We currently define three, kClone, kTransfer, and kDeserialize.
  2. The structure of the intermediate object that is returned by kClone and kTransfer to feed into the serializer.
  3. The mechanism for resolving the deserialization implementation on the receiving side.

This mechanism does allow for transferring native host objects but that's more specialized and I'm not asking for that here.

For the receiving side, one possible way of accomplishing the resolution of the deserializer is to use a registry in the form of an Event on the MessagePort

const mc = new MessageChannel();
mc.port1.addEventListener('deserialize', ({ serialized, done }) => {
  done(deserialize(serialized));
});
mc.port1.onmessage = () => {}

For structuredClone(), the same basic pattern can be used, passing in an EventTarget as a "deserialization controller"

const deserializer = new EventTarget();
deserializer.addEventListener('deserialize', ({ serialized, done }) => {
  done(deserialize(serialized));
});
structuredClone(new Foo(), deserializer);

If a deserializer is not provided or fails, an appropriate DOMException is reported.

jasnell avatar Dec 18 '21 17:12 jasnell

This feels like a case where we need to step back and deal with use cases, per https://whatwg.org/faq#adding-new-features .

In particular, it's not clear to me what use cases this proposal covers which can't be covered by

const serializable = customPreSerialize(data);
postMessage(serializable);

and

onmessage = e => {
  const data = customPostDeserialize(e.data);
};

domenic avatar Dec 18 '21 21:12 domenic

one possible way of accomplishing the resolution of the deserializer is to use a registry in the form of an Event on the MessagePort

In the browser, structured clone is used for IndexedDB and History state in addition to postMessage. [Serializable] platform objects are supported by all of them (if not entirely consistently across browsers). Is this idea about extending the message channel API only or is the goal to extend structured clone in general?

Side question: In HTML, “transfer” is a special case of “clone” where “[the object is] not just cloned, [but also becomes] no longer usable on the sending side”. Is this the same meaning intended by JsTransferable / kTransfer?

bathos avatar Dec 18 '21 21:12 bathos

Node.js itself presents a solid set of use cases here. Take AbortSignal, for instance. It already inherits from EventTarget. In order to make it cloneable as has been discussed in another issue over in the dom repo, we also have to make it have the JSTransferable in it's prototype chain. We accomplish it by creating the AbortSignal first, then actually creating an instance of JSTransferable then setting its prototype to the AbortSignal. It ends up being largely transparent to the user but it's really a bit of a hack. Given that many of the Web Platform APIs are implemented in JavaScript in Node.js that becomes the only way we currently have to make them cloneable or transferable. Because we're creating a native object, there's also a performance penalty. It would be great if we could avoid that.

The other use case is the known issue that JS class instances can't currently be cloned. Sure we could require that every application come up with it's own intermediary format for any JS class object they might want to clone but it would be nicer if we made it a bit easier for them -- and doing so in a way that would work consistently across multiple javascript runtimes by baking it into the standard. Basically, I'd like to be able to create a cloneable JavaScript class that works with structuredClone and postMessage no matter what platform the code is run in.

jasnell avatar Dec 19 '21 05:12 jasnell

I don't understand the applicability of your first paragraph to the HTML and DOM Standards. How Node.js chooses to implement those specs has nothing to do with whether the specs make these things clonable, and in general implementation limitations or choices should have no bearing on standardization or use case discussions.

So it sounds like this reduces to making something about user classes nicer. Can you state that in the form of use cases that have come up concretely, like the example in the FAQ entry I linked to?

domenic avatar Dec 19 '21 06:12 domenic

Note that there seems to be some demand for this on the TC39 side so proxy objects can be serialized, which seems like a legitimate use case.

annevk avatar Dec 20 '21 07:12 annevk

Proxies are a good case, yes. I would argue that making it easier for instances of a class is also quite valid, as would the ability to allow an object include private state in the clone.

The other use case are deeply nested object graphs or maps where you don't really know what's necessarily there in advance and can't really know if it's even possible to extract a serializable representation.

Consider a case such as:

class Foo {
  #bar = undefined;
  constructor(a) {
    this.#bar = a;
  }
  doSomething() {
    if (this.#bar === 1) { /* do something */ }
    else { /* do something else */ }
  }
}

const map = new Map();
map.set("abc", new Foo(1));
map.set("xyx", new Foo(2));

postMessage(map);

Or, the case where the above Foo instance is deeply nested into some complex object graph.

Using the postMessage(getSerializableRep(obj)), I would have to walk the entire tree and build a new one that is guaranteed to contain all serializable objects, then walk that same graph on the deserialization side to get the proper object types. That's extremely cumbersome when it could be done while the original graph is being serialized/deserialized.

jasnell avatar Dec 20 '21 20:12 jasnell

Note that there seems to be some demand for this on the TC39 side so proxy objects can be serialized, which seems like a legitimate use case.

I think Mark in that thread would probably prefer that Proxies be treated like other objects (i.e. iterate the properties and serialize the values, plus a little complexity around arrays and stuff), rather than explicit support for serializing them in any special way. (That's also my preference, full disclosure.) So I don't think this should count as a use case for the purposes of this thread, necessarily.

bakkot avatar Dec 20 '21 23:12 bakkot

@bakkot That is also what I’d like to see re: proxies, both to close the Proxy-exotic-object-status observability hole it creates and because there are already user-code-invoking paths (including getter invocation) so it seemingly(?) isn’t accomplishing anything useful. (Likewise “value is an Array exotic object” → IsArray(value)).

Regarding the proposed idea itself, I’m still pretty curious if the concept would be specific to messaging. We currently use a “descriptor wrappers” pattern towards these ends sometimes with History. A well-known-symbol contract sounds like an appealing alternative, but I’m not sure how (or if) the premise could really work in that context.

bathos avatar Dec 21 '21 00:12 bathos

@bakkot wouldn't that mean you can still sniff out proxies from sets or platform objects and such?

annevk avatar Dec 21 '21 07:12 annevk

@annevk A Proxy for a Set already does not behave like a Set: Set.prototype.has.call(new Proxy(new Set, {})) throws. (Similarly for platform objects like Image or whatever.) That's not really a problem.

Rather, the concern is whether you can distinguish between a Proxy for a regular, non-platform object and a bare such object.

Edit: on discussing with @erights, he's also concerned about a Proxy for a Set being practically usable like a Set, so the sketch above wouldn't entirely satisfy him.

bakkot avatar Dec 21 '21 07:12 bakkot

I see, that seems like a relatively straightforward change then. Nice.

annevk avatar Dec 21 '21 07:12 annevk

Personally, I would like to see API like:

structuredClone.register(Symbol.for('Foo'), {
   deserialize(v: Serializable): Foo;
   serialize(v: Foo): Serializable;
})
class Foo {
    [Symbol.structuredCloneIdentifier]: Symbol.for('Foo');
}

If object with custom Symbol.structuredCloneIdentifier is passed and there is no deserializer on receiving side, error is thrown.

Ginden avatar Jan 08 '22 16:01 Ginden

I just found my self in need of cloning a custom built class that i wish to save in IndexedDB

jimmywarting avatar Mar 18 '22 22:03 jimmywarting

One idea for deserialization of custom objects across realms could to be utilize a feature like module blocks to provide deserialization steps.

i.e. Suppose we have some class we want to make serializable/deserializable:

// just a toy example to demonstrate the API
class Point {
    #x;
    #y;
    
    constructor(x, y) {
        this.#x = x;
        this.#y = y;
    }
    
    get x() {
        return this.#x;
    }
    
    get y() {
        return this.#y;
    }
}

Then serialization would be pretty trivial by just providing some method:

class Point {
    // ...
    [structuredClone.serialize]() {
        return { x: this.#x, y: this.#y };
    }
}

However because Point may not in general exist in a given worker or such we send an object to, a simple [structuredClone.deserialize] can't work, however something like a module block would:

class Point {
    // ...
    static [structuredClone.deserializerModule] = module {
         // Actually import the Point class, this way
         // we can create the Point objects in any
         // worker/etc that has this module block
         import Point from "./Point.js";
             
         // The actual deserializer function
         export function deserialize({ x, y }) {
             return new Point(x, y);
         }
    }
}

This would work with something like worker.postMessage, by when serializing a point say Point(3,4), also captured is a reference to the deserializerModule, this is passed as part of the serialization. i.e. The custom object would really be serialized to something like:

{
    [[Type]]: "custom",
    // The serialized data returned by [structuredClone.serialize]
    [[Data]]: { x: 3, y: 4 },
    // The deserializer [structuredClone.deserializerModule]
    [[Deserializer]]: module { ... }
}

During deserialization when [[Type]]: "custom" is seen, it imports the module into the worker/etc (which if it's already been imported would be idempotent, as that's how module caching works). It then calls the resulting module's .deserialize(...) export with [[Data]] to produce the result.

Now there is one caveat here, because import(...) is asynchronous this would be okay for passing asynchronously cross-thread, but structuredClone is sync. For this we could easily just have separate properties for cross vs local thread:

class Point {
    // thread local deserializer
    static [structuredClone.deserialize]({ x, y }) {
        return new Point(x, y);
    }
    
    // cross-thread deserializer
    static [structuredClone.deserializeModule] = module {
        import Point from "./Point.js";
        
        export function deserialize(data) {
            return Point[structuredClone.deserialize](data);
        }
    }
}

The actual API shape is fairly immaterial, but it shows the idea that we can transfer a deserializer across threads to perform deserialization. And technically the dependency on module blocks isn't really true either, they could be replaced by just providing a deserializer url (although module blocks definitely solves issues regarding CSP and such):

class Point {
    static [structuredClone.deserializeModule]
         = new URL("./Point_deserializer.js", import.meta.url).href;
}

We could even imagine something a bit less dynamic if that would help implementations by having an explicit register step (as previously suggested by @Ginden) akin to how customElements.define works:

i.e.

class Point {
    // ...rest of impl

    static [structuredClone.serialize](point) {
        return { x: point.#x, y: point.#y };
    }
    
    static [structuredClone.deserialize]({ x, y }) {
        return new Point(x, y);
    }
    
    static [structuredClone.deserializeModule] = module {
        import Point from "./Point.js";

        export function deserialize(data) {
            return Point[Symbol.deserialize](data);
        }
    }
}

// Capture the initial value of [structuredClone.serialize], [structuredClone.deserialize]
// and [structuredClone.deserializeModule] similar to how customElements.define captures
// the initial values of connectedCallback and stuff so it can optimize them more easily
structuredClone.register(Point);

Jamesernator avatar May 24 '22 04:05 Jamesernator

Think it would be handy if we could 1) clone something into a (shared)ArrayBuffer or Blob, 2) Send it via some api to NodeJS / Deno / Bun.js / WebRTC peer to peer 3) And then serialize it back from some binary data as a way to replace JSON that loses some information when you for example convert a Date into a String. JSON.parse(JSON.stringify(new Date())) -> "not the same thing"

JSON can't handle binary data very well it lacks support for stuff like: circular ref pointers, Blob, File, Set, Map, BigInt, TypedArrays, ArrayBuffer, Date, and everything else that structuredClone supports

JSON is fairly limited to what you can do with it. I think It's time to replace the old legacy JSON api with something newer that dosen't need to convert images to base64 and also have the potential to decrease the payload with something more compact.

jimmywarting avatar Sep 17 '22 18:09 jimmywarting

Think it would be handy if we could 1) clone something into a (shared)ArrayBuffer or Blob

I think you're more so asking for: #3517

Jamesernator avatar Sep 18 '22 02:09 Jamesernator

no progress but the demand is greater and greater these days ... my proposal was to add Symbol.clone that acts ust like toJSON.

I don't have strong opinion on the Symbol.clone name, it could be as well more verbose as long as something makes it possible to postMessage proxies without breaking or requiring manual intervention from users.

Thanks for considering any progress around this topic, it's essential also in WASM related projects and specially when WASM code runs in Workers.

WebReflection avatar May 15 '24 09:05 WebReflection

We could even imagine something a bit less dynamic if that would help implementations by having an explicit register step (as previously suggested by @Ginden) akin to how customElements.define works:

Explicit registration step, using either strings or symbols stored in global symbol registry, is the only solution that I can think of that would reasonably satisfy following constraints:

  • Allow multiple libraries to happily coexist within single application.
  • Allow restoring from external disk in server context.
  • Allow later retrieval by changed code (what if you store your object in IndexedDB, and it's restored by different code?)

Just registering Point is not enough, because receiving side can't identify that Point class matches Point on sender side - duplicated class names are pretty common.

Ginden avatar May 15 '24 10:05 Ginden

Just registering Point is not enough, because receiving side can't identify that Point class matches Point on sender side - duplicated class names are pretty common.

I wasn't suggesting using the class name whatsoever, rather the prototype itself is the registry key.

Yes my suggestion doesn't support storage, i.e. it only supports postMessage/structuredClone similar to other non-storage types (SharedArrayBuffer, MessagePort, etc), but has the advantage of being able to deserialize in the agent cluster without registering per agent (i.e. structureClone.register would allow the engine to prepare the deserializer in whichever agents it wants).

Jamesernator avatar May 15 '24 11:05 Jamesernator

FWIWI I don't think registering is helpful + if it uses global symbols it still collides. In my specific use case proxies in a realm don't actually want/need or can't be deserialized, they are forwarded back (Atomics + Proxy) so that serialization is all it's needed.

Once serialization can return something else compatible with the structuredClone algorithm I think it'd be up to the user / developer to decide what to do with that serialized data. automagic deserialization looks more dangerous than useful to me as it requires registering things twice per each realm and if the registration is not aligned who knows what happens while if there is just enough control to decide what to send in a postMessage related dance we should be good, as we've been good to date using just toJSON for more complex use cases/data.

WebReflection avatar May 15 '24 12:05 WebReflection