html
html copied to clipboard
[Proposal] ImageBitmap.getImageData
This is meant to address #3802, and supersedes #4748. More background info can be found on those two issues.
There is currently no high-performance code path for getting ImageData
from a MediaStreamTrack
or an HTMLVideoElement
. The current idiom for doing so involves drawing to an HTMLCanvasElement
(or OffscreenCanvas
where available) and creating a new ImageData
each time. This is particularly low-performance on lower spec devices.
For the purposes of this proposal, high-performance means:
- Manual control over memory allocation/deallocation instead of leaving it to GC
- Low enough latency to maintain reasonable (20+ FPS) frame rates
I have a repo with performance results testing the various methods of getting ImageData
for webcam frames in the latest Chromium and Firefox, using a Raspberry Pi Model 3B+ as a low-spec, easily reproducible test bed. The low memory (1 GB) is of particular concern for this kind of test, and low disk performance means swapping at that kind of rate will lock up the device. Firefox fails all tests as it immediately runs the system out of memory when creating ImageData
for 1080p video, presumably because its garbage collector is overwhelmed. This isn't entirely surprising, as at 30 FPS that's 240 MB/sec of garbage being generated. This is why a memory-efficient way of generating ImageData
for webcam frames is crucial.
As opposed to #3802 and #4748, this proposal seeks to remove the need to use a canvas as an intermediary for getting ImageData
, which provides a performance boost, and in providing a new API it can have changes which would otherwise be breaking for CanvasRenderingContext2D.getImageData
.
The main usefulness in this proposed change is the ability to steal the memory from an existing ArrayBuffer
when getting ImageData
from an ImageBitmap
. Since an ArrayBuffer
can be detached/neutered, this allows a clean way to reuse memory without dangling references.
Here's the proposed change in Web IDL. No strong opinion on any naming. The (perhaps poorly named) neuter
option would call ImageBitmap.close
before resolving the ImageBitmap.getImageData
promise, with the idea being to avoid triggering GC as much as possible. Actual testing shows mixed (and possibly negligible) results, so it may be dropped as an option.
dictionary GetImageDataOptions {
ArrayBuffer? buffer;
boolean neuter = false;
}
partial interface ImageBitmap {
[NewObject] Promise<ImageData> getImageData(int sx, int sy, int sw, int sh, optional GetImageDataOptions = {});
}
Benefits of this proposed change:
- Stable memory footprint - reusing the memory from an existing
ArrayBuffer
allows developers to know the memory footprint of their code rather than leaving it to the fluctuations of GC, which as seen by the Firefox test results, may be inadequate. - Helps avoid memory fragmentation. Similar to the stable memory footprint, reusing memory will help avoid memory fragmentation which slows down performance. Fragmentation is prone to happen with these sort of large allocations of contiguous memory, as once it's freed, a small object may be allocated in the multi-megabyte chunk, preventing it from being reused, and instead requiring a new multi-megabyte chunk to be allocated. Allocation methods which use arenas can suffer from a 'high water mark', holding significant system memory with most of it being unused by the application, but unavailable to the system.
- Since
ImageBitmap
is[Transferable]
, it can be efficiently offloaded to a web worker where theImageData
can be extracted and processing done off the main thread. Firefox currently has no way to perform this extraction ofImageData
on a web worker due to not supporting2d
as a context forOffscreenCanvas
. - Pairs efficiently with methods of getting an
ImageBitmap
for webcam data or video, such ascreateImageBitmap(videoElement)
andImageCapture.grabFrame
. - Avoiding canvas removes potential for captured webcam frames from being written from CPU-to-GPU-to-CPU when accelerated canvases are in-use.
- Returning a promise allows it to be used on the main thread without blocking, and opens the possibility to use multiple threads for particularly large resolutions.
Open questions:
- How to deal with concurrency issues due to asynchronous API, when reusing an
ArrayBuffer
? I believe as long as the originalArrayBuffer
is detached before promise is returned, then it should be no different than transferring anArrayBuffer
currently. - How to deal with concurrency issues when an async
ImageBitmap.getImageData
is in-progress and anImageBitmap.close
or transfer occurs? I would assume both issues also exist with usingcreateImageBitmap(otherImageBitmap)
, although I don't see anything mentioned in the spec for concurrency issues here.
The repo I linked earlier in this post contains patch implementations for both Chromium and Firefox. They're incomplete and were done just for testing the 'go right' path: they don't have error handling, and they're not pretty (apologies to the devs for bastardizing their code). The Chromium patch shows 2-3x the performance of the current best method, achieving 30 FPS at 720p@30 and over 20 FPS at 1080p@30. In the latter case the bottleneck is elsewhere in the code. For Firefox there can be no performance comparison since it can't complete the testing without the patch to begin with.
Overall this seems like a low-effort to implement, with a large positive impact.
I don't really understand the proposal (in particular it seems you're suggesting to use a detached ArrayBuffer
somehow, but that's not possible). I recommend reading https://whatwg.org/faq#adding-new-features and perhaps starting a thread at https://discourse.wicg.io/ to refine this a bit.
I don't really understand the proposal (in particular it seems you're suggesting to use a detached ArrayBuffer somehow, but that's not possible).
@annevk, I'm not suggesting using a detached ArrayBuffer, I'm suggesting detaching one.
let imageBitmap; // Assume this is an ImageBitmap we want ImageData from
const oldImageData = new ImageData(1920, 1080);
const oldArrayBuffer = oldImageData.data.buffer;
const newImageData = await imageBitmap.getImageData(0, 0, 1920, 1080, { buffer: oldArrayBuffer });
const newArrayBuffer = newImageData.data.buffer;
// oldArrayBuffer is now detached - the memory it had now belongs to newArrayBuffer
That's an overly wordy example with the variable names, how it would actually be used with ImageCapture.grabFrame
:
let imageBitmap;
let imageData = new ImageData(1920, 1080);
while (true) {
imageBitmap = await imageCapturer.grabFrame();
imageData = await imageBitmap.getImageData(0, 0, 1920, 1080, { buffer: imageData.data.buffer });
// Process ImageData
}
This would be useful also for cases where code wants to get raw decoded image data.
Currently constructed ImageBitmap
is the only way to disable alpha premultiplication (well, aside from WebGL, but it's much more complicated).
However, there is currently no way to read raw pixels back from the canvas once the image bitmap has been transferred to it, because Canvas allows getting context only of a single type and getImageData
lives only on 2D context today.
Referring to the closure of #4748
While I agree having this method on ImageBitmap is also a very good idea, I don't see why the CanvasImageData mixin's getImageData
method shouldn't get extended too.
It's not rare at all to have to get updates every frame from the previous drawings, and having to allocate new memory every time also means more garbage to be collected every frame.
Unless I'm missing something, generating an ImageBitmap from a Canvas also requires allocating new memory, so for this case, this proposal doesn't add much.
@Kaiido, thanks for taking a look at this.
While I agree having this method on ImageBitmap is also a very good idea, I don't see why the CanvasImageData mixin's getImageData method shouldn't get extended too.
Sure, there's plenty of benefit to it existing in both places. As I mentioned in the original comment ("in providing a new API it can have changes which would otherwise be breaking for CanvasRenderingContext2D.getImageData
"), there's a downside on the existing CanvasRenderingContext2D.getImageData
in that it doesn't return a promise, and changing that would be a breaking change to that API, so that's not going to happen. While this API doesn't have to return a promise, that allows more flexibility for high-performance implementations, allowing the cropping and copying of memory to not block the main thread.
So, yes, a synchronous version of this proposal could also be implemented with CanvasImageData.getImageData
.
Unless I'm missing something, generating an ImageBitmap from a Canvas also requires allocating new memory, so for this case, this proposal doesn't add much.
I believe you're right, if you're solely concerned with ImageData
from a canvas, then you're still ending up with an allocation. If you're concerned with video data though (which is my main motiviation with this proposal), you skip the canvas intermediary and use an ImageBitmap
directly from the video. There's no way to avoid an allocation per video frame you want ImageData
for without a much more significant change to APIs. The nice thing about ImageBitmap
is with the close
method you have direct control over when that memory is released and so you don't get garbage collection churn.
I also had this same problem and the only way I found to get around this was through the service woker where for each request I filter until I get the image I want and use this code to binary opeter in the image
fetch('https://upload.wikimedia.org/wikipedia/commons/7/77/Delete_key1.jpg')
.then(res => res.blob()) // Gets the response and returns it as a blob
.then(async (blob) => {
console.log(blob)
console.log(await blob.text())
});
I also had this same problem
Are you sure it was the same problem? I don't see how your fetch snippet is relevant to or helps with getting image data out of canvas context.
depends on the need can be exchanged blob.text () for blob.arrayBuffer () this code snippet shows how to get image data out of the context of the screen. but for this to work on the whole page I had to use service worker
@Slender1808 Sorry, but it still looks unrelated. Your snippet doesn't do anything with screen or canvas, let alone ImageData, it just loads an image from a URL. Perhaps you misunderstood what this proposal is about?
sorry i thought i was trying to get data from an image without using canvas for that
cc @whatwg/canvas
I'm a bit torn by this asynchronicity thing.
On the one hand, I can see very well how having more async APIs is a good thing, and how we should aim at making most new APIs that could work in parallel Promise based.
On the other hand, as has been said in the initial comment, there are a few issues in this exact case which makes it very hard to make this API actually run "in parallel".
I believe that the slowest operation that will be done here will be the read-back of bitmaps living in the GPU*, and this can't be done asynchronously because indeed the ImageBitmap could be closed right after the call, synchronously discarding the bitmap before it's been read.
I fear this ends just like createImageBitmap
which is actually synchronous with all but Blob sources.
This is in my opinion a problem because web-developers will certainly assume that since it's async, it's done in parallel and won't block the UI thread.
As an example, I saw people believing that using XHR.responseXML
was better than using a DOMParser
because XHR is async and they didn't realize that the parsing is actually done synchronously in the .responseXML
getter.
I digress, but I guess you can imagine how bad it can be to make the users believe their code won't block when it will actually do, "Let's batch one call per pixel in a loop to get all pixel values, it's async anyway".
*(I could be wrong here, maybe color-space conversion or unmultiplying are also slow operations?)
I believe that the slowest operation that will be done here will be the read-back of bitmaps living in the GPU*, and this can't be done asynchronously because indeed the ImageBitmap could be closed right after the call, synchronously discarding the bitmap before it's been read.
A new pending async readback would hold a strong ref to the resource, preventing actual closing of the resource. (this is the details behind the scenes, but this way close does precipitate release mostly-deterministically, unreliant on GC) (The alternative would be rejecting outstanding read promises on close(), but I don't think we want that)
*(I could be wrong here, maybe color-space conversion or unmultiplying are also slow operations?)
cpu-side colorspace conversion is slow (loooots of ops per pixel). Readback isn't really cpu-heavy per se, but rather potentially high-latency, as we have to wait for that resource to be "done". Otherwise it's just a copy or two, not slow per se.
As a user that reads images, reads their data (for elevation gis tiles), modifies their data (steganography), and needs to write it back out as a modified image (whew!) I REALLY need a lossless workflow.
ImageBitmap is at least hopeful with options to avoid alpha premultiply and color-space conversion.
For example, I currently have to use a webgl stunt to convert an image into reliabally lossless data. It gets weirder bundling the new modified data back to the file system.
Have pity on us image-as-data folks!!
I hope this gets the interest of implementers because it'd be nice to have this more direct access to ImageData
.
Ran into yet another project where I'd really want to decode image to raw RGBA data and remembered this issue. It would be really helpful to have getImageData on ImageBitmap itself, rather than going through the whole create canvas -> get 2D context -> draw image -> get image data back, both because it's slow & unergonomic, and because it doesn't actually preserve original RGBA values (due to premultiplication mentioned earlier).
Should be noted that WebCodecs do now offer a mean to get that data without going through a canvas. You can either create a VideoFrame
from your ImageBitmap
object, and then use its copyTo()
method to get the raw data, or even use directly the ImageDecoder
API to do the decoding.
This is a bit lower level than what I imagined an ImageBitmap#getImageData()
would have been, since authors will have to handle the various formats the image can be stored in (I420, BGRA, etc.), but I believe this is actually a good thing, since this allows for a faster path, which is what I believe is asked for here.
@Kaiido Thanks, I missed the ability to do this via new APIs (probably because, intuitively, VideoFrame seemed unrelated to just image decoding).
But yeah, while performance is one aspect of it, if it returns different formats, then it doesn't quite solve the usecase of just decoding image into RGBA data like we can with canvas. This step is still important / useful for graphic manipulation via external libraries, especially those compiled to Wasm.
Admittedly, users can compile format conversion to Wasm too, but, assuming I'm not alone in using graphic manipulation libraries in Wasm, it would be good to avoid that bloat and have the format handled by the decoding APIs (maybe as part of ImageDecoder
options?). It seems like all the necessary bits and functionality are already there in browsers, just not quite exposed via the API.
https://github.com/w3c/webcodecs/issues/92 is about adding such an API, however the current status is that it's not really required because going through an ImageBitmap
would return the data as RGB[A|X]
.
So if you want raw data as fast as possible, you take directly the data from the VideoFrame
that either the MediaStreamTrackProcessor
, VideoDecoder
or ImageDecoder
did produce, in whatever format it comes. And if you want RGBA
data, you go through an ImageBitmap
. For your case I'm not sure what would be the fastest between directly creating the ImageBitmap
from a Blob
, or going though an ImageDecoder
+ createImageBitmap()
, but I suspect both would be faster than going through a canvas anyway.
However I don't see anything there enforcing that behavior, and indeed in my tests against current Chrome's implementation, it does correctly convert from NV12
to RGBA
, and from I420
to RGBA
, but it keeps BGRA
as BGRA
. Admittedly it's trivial to convert between these latter formats, but this tend to prove that authors can't really be sure the conversion will be done, and having to do even trivial conversion is cumbersome anyway. I'll comment there to see if that conversion can be enforced.
it's not really required because going through an
ImageBitmap
would return the data asRGB[A|X]
And if you want
RGBA
data, you go through anImageBitmap
.
Not sure what you mean here by "going through an ImageBitmap". I mean, this issue is exactly about getting RGBA data (as Uint8Array) from ImageBitmap, which today is not possible without additionally going via Canvas -> 2d context -> drawImage -> getImageData.
Ah, I guess you're saying that creating VideoFrame
from an existing ImageBitmap
is supposed to convert it to RGBA (except for https://github.com/w3c/webcodecs/issues/92 issue), so then copyTo
would give a way to retrieve that RGBA data as raw bytes?
If so, yeah, if that issue is resolved and formats are converted consistently, that could be a viable alternative.
Ah, I guess you're saying that creating
VideoFrame
from an existingImageBitmap
is supposed to convert it to RGBA (except for w3c/webcodecs#92 issue), so thencopyTo
would give a way to retrieve that RGBA data as raw bytes?
Yes that's what I mean.
Thanks for the clarification. It's an interesting alternative.
It does seem a little worrying in terms of performance implications (will it always copy the image data to GPU and read it back?) but I agree it should be still faster than canvas... except that maybe canvas + willReadFrequently would be still faster?
I'm not sure if this is useful, but I've got a webgl function that turns an image into bytes with no distortions via alpha premultiply and color corrections. https://github.com/backspaces/agentscript/blob/master/src/RGBADataSet.js#L27
I'm not sure if this is useful, but I've got a webgl function that turns an image into bytes with no distortions via alpha premultiply and color corrections. backspaces/agentscript@
master
/src/RGBADataSet.js#L27
Have you managed to get this working using a HTMLVideoElement
as a TexImageSource
? Should be working according to the spec. I updated the snippet to use the webgl2
context, which should support it.