tfjs TFJS WebGL on WebWorker still blocks GUI

After investigating a report by a user who was attempting to use web workers + tfjs to reduce lagging of GUI when model performs inference I have noticed that when using backend "webgl" web worker makes no effect on executing code in a separate context as this does not apply to GPU.

I have confirmed that by setting backend to "cpu" there is no performance issue, as the TFJS execution correctly is executed on a new thread, such that the browser DOM updates are not interfered with. I have also confirmed that browser relies on GPU to update DOM for anything visual - not CPU.

Thus the request here is how to limit "webgl" execution to leave enough processing for DOM updates for other user tasks to prevent this "jankiness" from occuring.

Demo of issue:

https://codepen.io/jasonmayes/project/editor/DBYaRj

Simply change first line of tfWorker.js to import one of:

importScripts('dist/cpu.js');

or for WebGL backend change to:

importScripts('dist/main.js');

Confirmed this issue across devices including Windows 10, Desktop, Chrome and also Xiaomi 8 (Android phone with Snapdragon 845 processor)

Example output from Chrome Dev tools shows GPU block when using WebGL backend and also that all execution comes from WebWorker.js.

unnamed

Aug 10 '21 20:08 jasonmayes

Will the next version solve this problem???I also encountered this problem

Aug 11 '21 09:08 wayhow123

Team is currently discussing issue to see what may be best path of action for this as WebGL seems to be shared resource unlike like CPU on WebWorker which separates execution so that 2 processes can not eat the resources of the other. Please check back on this bug for updates. In the meantime if you are able to execute on "WASM" for your TFJS model or even "CPU" I have confirmed these work as CPU based and when in web worker execute on a different thread as intended which does not block the GUI. It is only WebGL backend that is effected by this for now which is often the default form of execution for models if no backend is specified.

Aug 11 '21 18:08 jasonmayes

I don't understand. Which part of WebGL is not threaded? Can you provide a link to a description of the underlying WebGL issue?

You can certainly render to offscreen canvases using a second GL context without blocking the main thread. Sure the GPU is a shared resource, but is it really true that the browser doesn't use a separate GL context and off-screen rendering in the web worker threads? Or is it something about TFJS that is improperly using WebGL?

On Wed, Aug 11, 2021, 11:59 AM Jason Mayes @.***> wrote:

Team is currently discussing issue to see what may be best path of action for this as WebGL is not "threaded" like you can do with CPU on WebWorker. Please check back on this bug for updates. In the meantime if you are able to execute on "WASM" for your TFJS model or even "CPU" I have confirmed these work as CPU based and when in web worker execute on a different thread as intended which does not block the GUI. It is only WebGL backend that is effected by this for now which is often the default form of execution for models if no backend is specified.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tfjs/issues/5454#issuecomment-897073644, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQE2KWVKC5KWWWYYCBHA3T4LCCZANCNFSM5B43IDPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Aug 12 '21 14:08 danwexler

I made a slightly simplified version of the code to the demo the original reporter created here: https://codepen.io/jasonmayes/project/editor/DBYaRj

Executing this - click the big button at the bottom - and you will notice the webcam feed freezes as do the red numbers rendered to the left of the button.

If I change the TFJS backend to CPU it does not have this "lag".

I am unsure of the exact reason that is causing this, if you have suggestions of what it could be please do feel free to suggest, but the only difference here is the backend being used indicating something is not sharing the GPU as intended when using WebGL backend.

I will follow up with the team as to how WebGL execution works in TFJS when called from a WebWorker context in case something needs to be changed there.

Aug 12 '21 16:08 jasonmayes

Seems like there's at least an equal chance the bugs are in TFJS as in WebGL. I'd also suggest testing in other browsers, e.g. Firefox. Some links discussing non-blocking WebGL rendering in web workers:

Chrome: https://developers.google.com/web/updates/2018/08/offscreen-canvas Firefox (see WebGL Worker https://github.com/kripken/webgl-worker repo): https://research.mozilla.org/2014/07/22/webgl-in-web-workers-today-and-faster-than-expected/ TFJS: @.***/webworker-in-tensorflowjs-49a306ed60aa

Many more on D3 and other tools with simple web searches.

This is an issue that would definitely benefit from some investigation by the TFJS dev team as performance regressions in WebGL are easy to miss. Updates can break the fast paths, and careless data copies and transfers can easily overwhelm performance gains.

On Thu, Aug 12, 2021, 9:53 AM Jason Mayes @.***> wrote:

I made a slightly simplified version of the code to the demo the original reporter created here: https://codepen.io/jasonmayes/project/editor/DBYaRj

Executing this - click the big button at the bottom - and you will notice the webcam feed freezes as do the red numbers rendered to the left of the button.

If I change the TFJS backend to CPU it does not have this "lag".

I am unsure of the exact reason that is causing this, if you have suggestions of what it could be please do feel free to suggest, but the only difference here is the backend being used indicating something is not sharing the GPU as intended when using WebGL backend.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tensorflow/tfjs/issues/5454#issuecomment-897802279, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEQE2IILXYAP4OHDPMVBN3T4P4A3ANCNFSM5B43IDPA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Aug 12 '21 17:08 danwexler

In my first glance at your example, you are likely to be bound by data transfer rather than compute. You are copying image buffers from the video to the webworker using postMessage({img: bufferT, taskType: 'tf_model'}), which copies the bufferT. Instead you should be using the second argument for postMessage to send the buffer as a Transferable. You should also test it out with much larger images, to ensure you are compute bound. I'm not exactly sure what your model is doing, but also make sure that you do enough computation entirely down on the GPU to minimize the texture download costs.

I'm not totally sure, but you are also not using any of the resulting computed results, which is a bit artificial. However, that may be fine for this testing, if you want to ignore transfer costs from the GPU back to the DOM.

Aug 12 '21 17:08 danwexler

The original user who reported the issue can not share their model / all code so instead for now has shared the demo code you see above and put some placeholder code to execute as detailed here in place of the real model that replicates the issue they had: https://discuss.tensorflow.org/t/how-to-get-the-duration-of-predict/3311/18 which is what is being executed when tfWrapper.processImg(inputFloat); is called.

  let testRand = tf.randomNormal([1024 * 1024 * 4]);
  let {values, indices} = tf.topk(testRand, 1024 * 1024);
  let valueArray = values.arraySync();
  let indArray = indices.arraySync();

Aug 12 '21 18:08 jasonmayes

I think the next steps are:

Check the tf.randomNormal implementation. I'd be surprised if that runs on the GPU at all (WebGL does not have random number support). Often this is simulated in a number of ways if you do want it to run within a shader, e.g. using a texture filled with drand48() values from the CPU. Regardless, it is likely doing some CPU work and we want to know why it isn't doing that work within the Worker thread.
Check the tf.topk implementation to see if that uses a GPU-side reduce or, instead, if it reads back to the CPU and does the work on the CPU, and, again, why this might be blocking the main thread.
Good to see that we're doing readback of the value and indices array, which is a bit better, but I'd again be very concerned that this test is doing very little GPU work compared to a normal TF network. In order for the GPU to be faster, the amount of work done on the GPU must be much larger than the "work" required to transfer the data to the GPU and back again. It is very common for simple tests to fail at this and send people off to lala optimization land.

One difficult situation will be dealing with the WebGL GPU drivers. If we run a software path for WebGL that runs in the driver, it will most likely run in the main thread.

Aug 12 '21 19:08 danwexler

As another point of reference: I write a Firefox addon that spawns a "content page" for performing model processing, then have the background page interact with it over messages. The content page is more-or-less a normal web page and it blocks for several seconds during the model load and first inference with the WebGL backend as the shaders compile. Specifically, it blocks the entire browser: other tabs cannot be switched to for part of the load time, etc. I always assumed this had to do with some type of resource contention around the main render thread, but do not know the root cause. Unfortunately there is not a Chrome port due to missing WebExtensions API's, but if you are curious to see code that exhibits the issue, the code and current model can be found here. I had even just created a minimal reproduction for the TF.js for a different issue that might be helpful for anyone looking too.

Aug 20 '21 01:08 wingman-jr-addon

Any update on this issue? I use tf.signal.stft in a web worker with webgl backend and UI freezes until processing is finished.

Oct 21 '21 09:10 labroskokkalas

So the team have been researching into how to reduce GPU load when loading models / inference to give time back to browser render to reduce freezing of GUI etc. It seems to be more about how the browser schedules tasks to GPU. From what I understand WebWorker is aimed at CPU multithreading, and does not account for GPU tasks that are spawned from it which are still shared with the main browser level GPU render thread it seems. However focusing on GPU sharing instead we have had some promising results but still working on refining publishable solution to prod. This is actively under investigation though.

Oct 21 '21 19:10 jasonmayes

@jasonmayes I had quick followup. Some background text: "https://www.tensorflow.org/js/guide/platform_environment "TensorFlow.js executes operations on the GPU by running WebGL shader programs. These shaders are assembled and compiled lazily when the user asks to execute an operation. The compilation of a shader happens on the CPU on the main thread and can be slow. TensorFlow.js will cache the compiled shaders automatically, making the second call to the same operation with input and output tensors of the same shape much faster. Typically, TensorFlow.js applications will use the same operations multiple times in the lifetime of the application, so the second pass through a machine learning model is much faster.

TensorFlow.js also stores tf.Tensor data as WebGLTextures. When a tf.Tensor is created, we do not immediately upload data to the GPU, rather we keep the data on the CPU until the tf.Tensor is used in an operation. If the tf.Tensor is used a second time, the data is already on the GPU so there is no upload cost. In a typical machine learning model, this means weights are uploaded during the first prediction through the model and the second pass through the model will be much faster."

How do these parts relate to using webworkers for webgl and the above thread? What I noticed is the first time inference is usually five times slower so if these parts such as the compilation can be done without freezing the ui that would be amazing.
I know offscreen canvas is is not available in browsers such as safari/firefox. I think a lot of production use cases will need to support these especially Safari? Can a solution be made with that in mind? IE maybe the compilation part which might not require the canvas reference be made using the webworker so all browsers can see that speed up?

Thanks, Rohan

Oct 24 '21 18:10 rohanmuplara

I shall let @lina128 reply as she is currently working on the final solution to this which looks very promising in terms of that initial load blocking issue you mentioned by allowing time back to the browser render thread to do what it needs to do for updating GUI, and maybe she has some thoughts on q2 too.

Essentially though:

In the WebGL API, some methods are blocking, while others are not. These are different from the sync/async concept in JS. The blocking methods in WebGL mean that, certain WebGL entry points cause synchronous stalls on the calling thread (possibly the same calling thread Chrome uses for GUI rendering I believe as everything is rendered via GPU now on Chrome if I remember correctly). Even basic requests can take as long as 1ms, but they can take even longer if they need to wait for all graphics work to be completed.

In the new version Na is working on, we first compile and link everything without waiting because they are async, and then check and wait everything at the end, instead of individually, which gives time back to the browser to do other things it needs to perform in that time frame too.

Oct 25 '21 23:10 jasonmayes

In the new version Na is working on, we first compile and link everything without waiting because they are async, and then check and wait everything at the end, instead of individually, which gives time back to the browser to do other things it needs to perform in that time frame too.

Thanks. That would be very helpful. Beyond fixing the shader compilation, is there any plan to avoid blocking GUI by long shader execution time?

I recently created a background blur sample. It uses the WebGL backend to run tfjs deeplabv3 model in a worker with mediacapture-transform video processing API. It starts with low FPS for the first several video frames. But with more video frames coming, it ends up freezing the browser UI (not responding) on some entry level GPU.

Apr 02 '22 01:04 huningxin

So this new fix should be out shortly - please tune into our TensorFlow.js updates in May, that is all we can say for now but we look forward to hearing your feedback once you get a chance to try it.

Apr 03 '22 21:04 jasonmayes

Hello, @jasonmayes, @lina128 ! Thank you very much for your work and for the wonderful library you build. Do you have any updates regarding this topic? Any approximate estimates or early builds?

Jun 07 '22 21:06 RomanKiryanov

Could anyone point to an update concerning this issue?

Jan 16 '23 08:01 pr-o

So the WebGL part that was blocking should get for free with latest build of TFJS. See our announcement at Google IO last year 2022: https://youtu.be/GbgjafMdAIs?t=762 Thanks @lina128 for the fix.

Jan 17 '23 19:01 jasonmayes

Are you satisfied with the resolution of your issue? Yes No

Jan 17 '23 19:01 google-ml-butler[bot]