ebiten
ebiten copied to clipboard
Wasm on mobile browsers with an empty scene results in 35fps chrome, 10fps firefox.
Wasm on mobile browsers (android chrome and firefox) with an empty scene results in <35fps chrome, <10fps firefox. The fps drops further when sprites are added, often dipping to near zero. Input handling is also erratic at those fps's (touches via inpututil are sometimes missed).
The raylib wasm examples run at a solid 60fps with 10s of thousands of sprites on the same phone with the same browsers, so I can see it's not a system limitation.
I'm new to Ebiten so I don't know if this is expected performance for mobile wasm. Should I be able to get 60fps with 6 sprites on mobile browsers with ebiten wasm?
go version go1.15 darwin/amd64 github.com/hajimehoshi/ebiten v1.11.7 (tried back to 1.11.0 but no difference) mobile chrome 85.0.4183.81 (on android 8.1.0)
Tried both with and without ebiten.SetMaxTPS(ebiten.UncappedTPS) but it makes little difference to the overall fps.
(I can see the same fps issues on the ebiten.org sprites demo - so I don't think it's a local compilation issue)
Thanks.
This is a known issue that Ebiten's (or Go's) Wasm doesn't work efficiently on mobile browsers. I don't think there are low-hanging fruits to improve this. Probably we have to tackle Go compiler.
The raylib wasm
This is in C++, not Go, right?
This is a known issue that Ebiten's (or Go's) Wasm doesn't work efficiently on mobile browsers. I don't think there are low-hanging fruits to improve this. Probably we have to tackle Go compiler.
Thank you, I did not know this. I'll track the issue in the golang repo. <3
The raylib wasm
This is in C++, not Go, right?
Yes, this was just to confirm it was not a hardware/os limitation.
Thank you, I did not know this. I'll track the issue in the golang repo. <3
For example: https://github.com/golang/go/issues/32591
@hajimehoshi I made one change for Gio, which removes the syscall/js
overhead. On my desktop each frame took ~8ms, now it only takes ~3ms. But, on my cheap smartphone (Xiaomi 7A), it drops from ~90ms to ~35ms.
If you test the current Gio version (without that optimization: https://gioui.org/files/wasm/kitchen/) you notice that it may take ~8ms per frame (using the "Performance" tab from the browser). However, ~5.5ms (out of ~7ms) is wasted on the "GPU.BeginFrame", which is where most of the WebGL calls are made.
If you test my own version (https://inkeliz.com/extra/kitchen/v1/index.html), the same BeginFrame
is now ~0.5ms, so the entire frame is taking ~3ms now.
You can see the patch here (https://git.sr.ht/~inkeliz/gio/commit/1361089f5716162f94d6d9a3a4cc61b496d8b9c5) and the files here (https://git.sr.ht/~inkeliz/gio/tree/syscall-9/internal/glimpl), look the gl_js.s
, gl_js.js
and gl_js.go
.
Instead of call syscall/js
, it's possible to use the CallImport
assembly instruction, such as:
TEXT ·bufferData(SB), NOSPLIT, $0
CallImport
RET
Then, you can set your own function into the go.imports.go
variable (which lives in the wasm_exec.js
):
Object.assign(go.importObject.go, {
// bufferData (target Enum, src []byte, usage Enum)
"gioui.org/internal/glimpl.bufferData": (sp) => {
sp = (sp >>> 0) + OffsetContextIndex;
const webgl = gioLoadContext(sp);
webgl.ctx.bufferData(
gioLoadInt64(sp),
gioLoadSlice(sp + OffsetInt64),
gioLoadInt64(sp + OffsetInt64 + OffsetSlice),
);
},
});
Then, you call from your Golang, without the crap slow syscall/js
:
func bufferData(ref uint32, target Enum, src []byte, usage Enum)
func (f *Functions) BufferData(target Enum, src []byte, usage Enum) {
bufferData(f.Ref, target, src, usage)
}
You can see the function on the gl_js.js
files. Notice: it must be imported into the wasm_exec.js
before the wasm execution, In Gio we have the cmd/gogio, which builds the Gio programs. Now it include any _js.js
to the wasm_exec.js
, allowing to add any additional imports. That is why the gl_js.js
is here.`
Notice that the bufferData
takes the []byte
, and creates the uint8array
in the JS side, without the CopJStoGo
. So it's a single call and does everything. 🎉
I'm justing share my experience trying to improve the Gio performance on WASM. I think Ebiten will have some improvement if follow the same path. 👍
The CallImport
seems "undocumented", and there's no "CallExport
" the export seems exclusive to runtime itself. :\
That's an interesting idea. I thought Object.assign
should be called before WebAssembly.instantiate(Streaming)
. Is that correct? Is so, should we modify wasm_exec.js
?
My current idea to reduce overhead is to use WebGL2 APIs that can specify TypedArray with offsets and lengths so that we can reduce creating extra TypedArrays for each call: #1435
You don't need to modify the wasm_exec
directly. But, you should create the go
variable before, in order to use Object.assign
, and yes: before the WebAssembly.instantiate.
Currently the HTML is something like:
<script src="wasm_exec.js"></script>
<script>
const go = new Go();
let mod, inst;
WebAssembly.instantiateStreaming(fetch("test.wasm"), go.importObject).then((result) => {
mod = result.module;
inst = result.instance;
document.getElementById("runButton").disabled = false;
}).catch((err) => {
console.error(err);
});
</script>
That is what defines the go
variable, at const go = new Go();
. So you must add your imports after the const go = new Go();
and before the WebAssembly.instantiateStreaming
. In the end you will have something like:
<script src="wasm_exec.js"></script>
<script>
const go = new Go();
Object.assign(go.importObject.go, {
// your custom imports. :)
})
let mod, inst;
WebAssembly.instantiateStreaming(fetch("test.wasm"), go.importObject).then((result) => {
mod = result.module;
inst = result.instance;
document.getElementById("runButton").disabled = false;
}).catch((err) => {
console.error(err);
});
</script>
I think you get the idea, you need the go
variable (go = new Go()
) to call Object.assign(go.importObject.go)
, but it must be before the WebAssembly.instantiateStreaming
. In the end: Define the Go variable -> Add the imports -> Initialize the wasm.
Ah right, so the user would need to add the snippet to add import functions. Hmm, this sounds the final resort. Thank you for elaborating!
Wasm on mobile browsers (android chrome and firefox) with an empty scene results in <35fps chrome, <10fps firefox. The fps drops further when sprites are added, often dipping to near zero. Input handling is also erratic at those fps's (touches via inpututil are sometimes missed).
These results are nearly 2 years old now, is the state of WASM performance still this bad?
I think, generally speaking: yes.
Ebiten implemented some improvements. One improvement is using 'Bind' to avoid string conversions, but it doesn't remove the allocations inside syscall/js
itself. So, it's faster now, but...
I expect the performance to be better than it was before, but not "fast". I am also considering hardware and browser improvements (Safari is still the fastest). However, I don't expect the WASM to have a performance near the native equivalent. We need to consider the JS overhead, the syscall/js
allocations, the WebGL limitations, WASM still single-thread, and we can't control the garbage collector (which blocks everything).
Gotcha. This issue just feels a bit vague, perhaps separate issues for those goals would be better? There's already #719 for WebGPU, we can track something like https://github.com/WebAssembly/threads for WASM thread support, and we could create other issues for reducing JS overhead and potentially reducing GC delays on WASM. The fact that mobile browsers are going to be slower is fairly expected.
There seem no action items.