ebiten icon indicating copy to clipboard operation
ebiten copied to clipboard

Wasm on mobile browsers with an empty scene results in 35fps chrome, 10fps firefox.

Open TheMightyGit opened this issue 4 years ago • 10 comments

Wasm on mobile browsers (android chrome and firefox) with an empty scene results in <35fps chrome, <10fps firefox. The fps drops further when sprites are added, often dipping to near zero. Input handling is also erratic at those fps's (touches via inpututil are sometimes missed).

The raylib wasm examples run at a solid 60fps with 10s of thousands of sprites on the same phone with the same browsers, so I can see it's not a system limitation.

I'm new to Ebiten so I don't know if this is expected performance for mobile wasm. Should I be able to get 60fps with 6 sprites on mobile browsers with ebiten wasm?

go version go1.15 darwin/amd64 github.com/hajimehoshi/ebiten v1.11.7 (tried back to 1.11.0 but no difference) mobile chrome 85.0.4183.81 (on android 8.1.0)

Tried both with and without ebiten.SetMaxTPS(ebiten.UncappedTPS) but it makes little difference to the overall fps.

(I can see the same fps issues on the ebiten.org sprites demo - so I don't think it's a local compilation issue)

TheMightyGit avatar Sep 02 '20 09:09 TheMightyGit

Thanks.

This is a known issue that Ebiten's (or Go's) Wasm doesn't work efficiently on mobile browsers. I don't think there are low-hanging fruits to improve this. Probably we have to tackle Go compiler.

The raylib wasm

This is in C++, not Go, right?

hajimehoshi avatar Sep 02 '20 10:09 hajimehoshi

This is a known issue that Ebiten's (or Go's) Wasm doesn't work efficiently on mobile browsers. I don't think there are low-hanging fruits to improve this. Probably we have to tackle Go compiler.

Thank you, I did not know this. I'll track the issue in the golang repo. <3

The raylib wasm

This is in C++, not Go, right?

Yes, this was just to confirm it was not a hardware/os limitation.

TheMightyGit avatar Sep 02 '20 10:09 TheMightyGit

Thank you, I did not know this. I'll track the issue in the golang repo. <3

For example: https://github.com/golang/go/issues/32591

hajimehoshi avatar Sep 02 '20 10:09 hajimehoshi

@hajimehoshi I made one change for Gio, which removes the syscall/js overhead. On my desktop each frame took ~8ms, now it only takes ~3ms. But, on my cheap smartphone (Xiaomi 7A), it drops from ~90ms to ~35ms.

If you test the current Gio version (without that optimization: https://gioui.org/files/wasm/kitchen/) you notice that it may take ~8ms per frame (using the "Performance" tab from the browser). However, ~5.5ms (out of ~7ms) is wasted on the "GPU.BeginFrame", which is where most of the WebGL calls are made.

If you test my own version (https://inkeliz.com/extra/kitchen/v1/index.html), the same BeginFrame is now ~0.5ms, so the entire frame is taking ~3ms now.


You can see the patch here (https://git.sr.ht/~inkeliz/gio/commit/1361089f5716162f94d6d9a3a4cc61b496d8b9c5) and the files here (https://git.sr.ht/~inkeliz/gio/tree/syscall-9/internal/glimpl), look the gl_js.s, gl_js.js and gl_js.go.

Instead of call syscall/js, it's possible to use the CallImport assembly instruction, such as:

TEXT ·bufferData(SB), NOSPLIT, $0
  CallImport
  RET

Then, you can set your own function into the go.imports.go variable (which lives in the wasm_exec.js):

Object.assign(go.importObject.go, {
        // bufferData (target Enum, src []byte, usage Enum)
        "gioui.org/internal/glimpl.bufferData": (sp) => {
            sp = (sp >>> 0) + OffsetContextIndex;
            const webgl = gioLoadContext(sp);
            webgl.ctx.bufferData(
                gioLoadInt64(sp),
                gioLoadSlice(sp + OffsetInt64),
                gioLoadInt64(sp + OffsetInt64 + OffsetSlice),
            );
        },
});

Then, you call from your Golang, without the crap slow syscall/js:

func bufferData(ref uint32, target Enum, src []byte, usage Enum)

func (f *Functions) BufferData(target Enum, src []byte, usage Enum) {
	bufferData(f.Ref, target, src, usage)
}

You can see the function on the gl_js.js files. Notice: it must be imported into the wasm_exec.js before the wasm execution, In Gio we have the cmd/gogio, which builds the Gio programs. Now it include any _js.js to the wasm_exec.js, allowing to add any additional imports. That is why the gl_js.js is here.`

Notice that the bufferData takes the []byte, and creates the uint8array in the JS side, without the CopJStoGo. So it's a single call and does everything. 🎉


I'm justing share my experience trying to improve the Gio performance on WASM. I think Ebiten will have some improvement if follow the same path. 👍


The CallImport seems "undocumented", and there's no "CallExport" the export seems exclusive to runtime itself. :\

inkeliz avatar Dec 14 '20 05:12 inkeliz

That's an interesting idea. I thought Object.assign should be called before WebAssembly.instantiate(Streaming). Is that correct? Is so, should we modify wasm_exec.js?

My current idea to reduce overhead is to use WebGL2 APIs that can specify TypedArray with offsets and lengths so that we can reduce creating extra TypedArrays for each call: #1435

hajimehoshi avatar Dec 14 '20 05:12 hajimehoshi

You don't need to modify the wasm_exec directly. But, you should create the go variable before, in order to use Object.assign, and yes: before the WebAssembly.instantiate.


Currently the HTML is something like:

	<script src="wasm_exec.js"></script>
	<script>
		const go = new Go();
		let mod, inst;
		WebAssembly.instantiateStreaming(fetch("test.wasm"), go.importObject).then((result) => {
			mod = result.module;
			inst = result.instance;
			document.getElementById("runButton").disabled = false;
		}).catch((err) => {
			console.error(err);
		});
	</script>

That is what defines the go variable, at const go = new Go();. So you must add your imports after the const go = new Go(); and before the WebAssembly.instantiateStreaming. In the end you will have something like:

	<script src="wasm_exec.js"></script>
	<script>
		const go = new Go();

                Object.assign(go.importObject.go, {
                       // your custom imports. :)
                })

		let mod, inst;
		WebAssembly.instantiateStreaming(fetch("test.wasm"), go.importObject).then((result) => {
			mod = result.module;
			inst = result.instance;
			document.getElementById("runButton").disabled = false;
		}).catch((err) => {
			console.error(err);
		});
	</script>

I think you get the idea, you need the go variable (go = new Go()) to call Object.assign(go.importObject.go), but it must be before the WebAssembly.instantiateStreaming. In the end: Define the Go variable -> Add the imports -> Initialize the wasm.

inkeliz avatar Dec 14 '20 06:12 inkeliz

Ah right, so the user would need to add the snippet to add import functions. Hmm, this sounds the final resort. Thank you for elaborating!

hajimehoshi avatar Dec 14 '20 07:12 hajimehoshi

Wasm on mobile browsers (android chrome and firefox) with an empty scene results in <35fps chrome, <10fps firefox. The fps drops further when sprites are added, often dipping to near zero. Input handling is also erratic at those fps's (touches via inpututil are sometimes missed).

These results are nearly 2 years old now, is the state of WASM performance still this bad?

superloach avatar Aug 30 '22 18:08 superloach

I think, generally speaking: yes.

Ebiten implemented some improvements. One improvement is using 'Bind' to avoid string conversions, but it doesn't remove the allocations inside syscall/js itself. So, it's faster now, but...

I expect the performance to be better than it was before, but not "fast". I am also considering hardware and browser improvements (Safari is still the fastest). However, I don't expect the WASM to have a performance near the native equivalent. We need to consider the JS overhead, the syscall/js allocations, the WebGL limitations, WASM still single-thread, and we can't control the garbage collector (which blocks everything).

inkeliz avatar Aug 30 '22 19:08 inkeliz

Gotcha. This issue just feels a bit vague, perhaps separate issues for those goals would be better? There's already #719 for WebGPU, we can track something like https://github.com/WebAssembly/threads for WASM thread support, and we could create other issues for reducing JS overhead and potentially reducing GC delays on WASM. The fact that mobile browsers are going to be slower is fairly expected.

superloach avatar Aug 30 '22 19:08 superloach

There seem no action items.

hajimehoshi avatar Jul 29 '23 15:07 hajimehoshi