serenity icon indicating copy to clipboard operation
serenity copied to clipboard

LibWeb: Add support for blob URL in web workers

Open shannonbooth opened this issue 9 months ago • 1 comments

Not really sure on approach of this going forward, but it does at least make more stuff work.

Further progress towards: https://github.com/SerenityOS/serenity/issues/23632

shannonbooth avatar May 05 '24 09:05 shannonbooth

So I'm a bit worried about the design here. Correct me if I'm wrong about this :D

It looks like the current approach is to store the Blob URL's "data" alongside the URL object. So my question is:

  • Where is the canonical location for an origin's blob store?
  • How do we adjudicate whether a Blob URL is allowed to be vended to a particular realm/worker/other webcontent process?

In previous conversations with @Lubrsi I had thought the consensus on the 'ideal' design (rather than the 'whee it works :marge:' design, which is still a valid design :) ) was something like this:

  • The Blob url acts as a an opaque token to a canonical store of blob URL data. SQLServer, UI Process etc.
  • When creating or retrieving a Blob URL's data, an IPC call is made to the blob store to retrieve the information associated with that token.
  • The data is only available through the get/set APIs, and so to JS it really is an opaque token

Is there an obvious transition from this design to the 'ideal' design? Or do I have this in my head wrong?

Another question is what the usage patterns for these things are. Are most pages creating 10-100 Blob URLs? Or thousands?

ADKaster avatar May 06 '24 18:05 ADKaster

Where is the canonical location for an origin's blob store?

All origins are stored in the same Blob URL store in the spec. See: https://w3c.github.io/FileAPI/#originOfBlobURL.

In our current implementation every process has its own BlobURLStore. That should technically be moved into a singleton process to allow for example one worker to also be able to revoke a blob for another worker. I think that would be my next step (where FileAPI::resolve_a_blob_url as an example becomes an IPC call, and is the mentioned step 1 below).

The Blob url acts as a an opaque token to a canonical store of blob URL data. SQLServer, UI Process etc. Is there an obvious transition from this design to the 'ideal' design? Or do I have this in my head wrong?

So, I think the big difference on the design here is on the whole topic of making the blob URL a token. The downside of the current design which follows spec is that it eagerly copies blob data from the registry to the URL instead of when the request for that blob through the token is made. So the performance would be worse (we would be inherently slower anyhow with the multi-process design I imagine). In the CloudFlare turnstile case (which is the only place that I've seen this used in the wild so far), it's not a very big deal, since the blobs are small (its only passing across some JS to run), and not many of them (one per worker). I am sure it may become a problem for larger usecases, like blob URLs for an image or something. Not sure what sites use that.

So I think there are two steps to moving towards the 'ideal' design the way I think about it:

  1. Making BlobURLRegistry live in a singleton process (an actual functional improvement)
  2. Making the Blob URL purely a token, and deferring copying the blob from the registry to until the request is made. (performance improvement)

shannonbooth avatar May 11 '24 10:05 shannonbooth