workerd Support structuredClone for Blob and File

Feb 24 '24 21:02 jasnell

Hmm, I wonder if it's actually correct to clone the content of these? Or are we actually supposed to create a new reference to the same underlying data? I guess technically the difference is not observable since these objects are immutable. So perhaps the implementation as-is produces correct behavior, but I wonder if we want to think forward to an implementation that does actually share the underlying buffer for efficiency.

Feb 26 '24 16:02 kentonv

... So perhaps the implementation as-is produces correct behavior, but I wonder if we want to think forward to an implementation that does actually share the underlying buffer for efficiency.

The implementation as is would also work for storage (thinking about more than just structuredClone()). But if we're thinking about ONLY the structuredClone() use case, then having the underlying data be shared/refcounted would be ideal. That's largely why I started with this one, fwiw... given that there's a difference between how we might serialize the type for storage vs. how we optimize the cloning for structuredClone().

Feb 26 '24 17:02 jasnell

Yeah thinking about this more (I also commented on your internal doc), I think if we decide to support Blob and File over RPC or in DO storage, we're likely to want to handle them specially, not just embed them inline. Since reading the content of a Blob/File is async, we have the opportunity to store that content separately and load it on-demand only if the app needs it.

So I think if we're going to add support for these at all right now, it should probably be specific to structuredClone(), and in that case it should only clone the reference, not the content.

Feb 28 '24 14:02 kentonv

Yep. Ok, moving this back to draft. The plan on this would then become:

Modify Blob/File to support two cases: a. Refcounted internal storage so that multiple Blob/File instances can share the same underlying data buffer. Cloning would then be a simple matter of increasing the refcount. (We could have the clone hold a Ref<Blob> to the original cloned but I'd rather avoid holding a strong reference to the original if we can avoid it, which I think we can fairly trivially). b. The ability to specify a streaming source for a Blob/File. This is a bit tricky since we're supposed to know the size of the Blob in advance, and multiple Blobs can be composed together. Multiple reads are also expected to be stable/idempotent such that they always return the same data, meaning we'll have to implement an internal cache for the streamed data. I implemented this last year for Node.js' implementation of Blob so definitely doable. These also need to be refcounted to allow streamed-blobs to be cloneable.
Only support structured clone use cases for Blob and File. That is, we would not support in-line serialization for storage. We may instead want to look at implementing support for the standard URL.createObjectURL() and implement a mechanism where Blob/File data can be pushed into a long term persistent storage that is multi-colo accessible. We would then be able to share the object URL that identifies the blob and use that in persistent storage instead of the inline data.

Moving this PR back to draft for the time being. These changes will actually end up being done over several PRs so I'll likely reworking this specific PR into one that sets up the initial set of internal changes in Blob to make the rest of this possible.

Feb 28 '24 16:02 jasnell

Refcounted internal storage so that multiple Blob/File instances can share the same underlying data buffer.

Blob's implementation already supports pointing at some other Blob. Doesn't seem like anything new is needed there.

The ability to specify a streaming source for a Blob/File.

I don't think that's actually needed for structuredClone? Maybe needed someday for RPC but I don't think we'd build that until we see clear use cases.

We may instead want to look at implementing support for the standard URL.createObjectURL() and implement a mechanism where Blob/File data can be pushed into a long term persistent storage that is multi-colo accessible.

Ehh let's look at use case first and then design the right API for it. I suspect createObjectURL() is not the right design outside of a browser (just like Cache API has proven to be the wrong design).

Feb 29 '24 00:02 kentonv

Blob's implementation already supports pointing at some other Blob. Doesn't seem like anything new is needed there.

Yep. and using that would be the easiest thing. I would like to explore using an inner ref counted struct so that we're not potentially holding onto both the Ref<Blob> and the wrapper object but that's a minor issue thanks to the optimistic gc mechanism. This is all super low priority and I probably won't do anything further on this until we figure out what we want to do with the streaming source for RPC (if anything). I mainly put this PR together to exercise the ser/deser stuff you did a bit to make sure the basic pattern was going to work for making things work with structuredClone and just the draft PR is enough of a POC to demonstrate it.

Feb 29 '24 01:02 jasnell

workerd workerd copied to clipboard

Support structuredClone for Blob and File

workerd
workerd copied to clipboard