hangover icon indicating copy to clipboard operation
hangover copied to clipboard

box86 and GL4ES instead of qemu

Open Heasterian opened this issue 3 years ago • 17 comments

Hey!

Could it be possible to use box86 instead of qemu? It should be faster option. GL4ES would be nice enhancement too.

Heasterian avatar Sep 02 '20 12:09 Heasterian

This is the first time I hear about box86. It sounds interesting indeed.

Looking at the box86 site there's an obvious catch though:

Box86 lets you run x86 Linux programs (such as games) on non-x86 Linux, like ARM (host system needs to be 32bit little-endian).

Hangover right now only works on 64 bit hosts. Changing this is possible, but requires some work (see issue #3).

Did you do any benchmarking of box86 vs qemu, in particular CPU emulation performance? What's making qemu-linux-user slow is that it runs the entire Linux userland inside qemu. Hangover doesn't do that, so in many ways the big advantage of box86 is already part of hangover. How does it compare with a CPU heavy load that doesn't call libraries, e.g. calculating the shasum of a multi-gb file?

stefand avatar Sep 02 '20 12:09 stefand

Only benchmarks that I have at this moment, are gaming benchmarks from youtube https://youtu.be/7he-KoiSe_U I should buy Jetson Nano this week so I could benchmark it for you.

Heasterian avatar Sep 02 '20 13:09 Heasterian

That video doesn't have any useful data unfortunately. The games are ancient, there's no framerate data (except the one game that's stuck at 60 fps vsync).

My main development machine was an nvidia shield. Any performance numbers I get on it won't be useful for comparing to an RPI4.

What's interesting at this point is comparing two solutions (e.g. qemu, box86) for the same problem (running x86 instructions on arm). Similarly, but orthogonally, the question of syscall emulation vs thunking libraries needs to be evaluated for performance. (qemu-linux-user uses syscall emulation, hangover and box86 use library thunks).

stefand avatar Sep 02 '20 13:09 stefand

A full game always runs into both. It has x86 code and calls the host system for e.g. 3D and sound. For a complete picture we also need tests that are heavy on the extremes in addition to "in-the-middle" games. E.g. shasum is heavy on CPU emulation needs whereas something like a webserver heavily interacts with the OS.

stefand avatar Sep 02 '20 13:09 stefand

Now I can try it only on Ubuntu installed in Proot of my phone. I don't think it would give you any usefull data. I'll benchmark it probably next week on Nano.

Heasterian avatar Sep 02 '20 14:09 Heasterian

@stefand I bet you heard about it from me: https://github.com/ptitSeb/box86/issues/4#issuecomment-474598788

AndreRH avatar Sep 02 '20 16:09 AndreRH

I could benchmark box86 vs qemu and box86+wine vs hangover. I can post results here if you want.

Heasterian avatar Sep 02 '20 18:09 Heasterian

A simple comparison of shasum on a huge file in box86 vs qemu would be a good start. Just make sure that the shasum binary you run in box86 actually does the work itself and doesn't call a library (gnutls? openssl?) through a thunk.

stefand avatar Sep 02 '20 19:09 stefand

You won't be able to benchmark on the nano due to the lack of a 32 bit userspace unfortunately

bylaws avatar Sep 07 '20 13:09 bylaws

You just need to compile box86 with gcc:armhf and g++:armhf and it's working fine on Nano without video output. I need only to find good way to launch i386 sha1sum form terminal.

Heasterian avatar Sep 15 '20 15:09 Heasterian

I made some benchmarks for comparison between box86 and qemu-i386-static. Programs i've used were prime95(FTL+Trail Factoring) and 7z b(benchmark). The device I've used is a RPI4.

bench.txt

For box86 the usage of native libs(instead of emulating them) can't be avoided in a reasonable manner as this(as well as its dynarec) is basically its underlying approach.

pjh64 avatar Oct 05 '20 22:10 pjh64

Those are pretty good results compared to qemu. Out of curiosity, what's the performance of a native arm binary on the same hardware?

The trail factoring numbers look weird, but the others speak a clear language. And I'd expect the heavy lifting to be done in the x86 binary and not lib calls - so the thunking box86 does shouldn't matter.

stefand avatar Oct 06 '20 13:10 stefand

is that box85 with or without dynarec enabled?

AndreRH avatar Oct 06 '20 19:10 AndreRH

@stefand: Did a short benchmark of 7z with the native binary. As for prime95 there doesn't seem to be an aarch64 or arm binary around. The weird performance gap that occurs when trial factoring with length factors around/exceeding 64 bit, is also replicable with other approaches to run x86 code on arm. For any solution that i know of it shows the best figures for length factors below 64 bit.

7znative.txt

@AndreRH: It had dynarec enabled. On interpreter-only 7z ratings are roughly around 10 times lower.

pjh64 avatar Oct 06 '20 22:10 pjh64

we make custom distros (twisteros.com) to make it suitable for arm64 builds. yes, we add multiarch to allow that. but there are other implementations, like appimages.

for performance number check here:

https://stands.fosdem.org/stands/box86/performances/

example on aarch64 on my channel:

https://youtu.be/BEYt5wzckvY

also another example here on rpi4:

https://video.fosdem.org/2021/stands/box86/

@AndreRH

The main problem we had with wine and would be amazing a collaboration here are 2: gl drivers doenst like the wrappers, there is nothing to do here, only wait for vulkan or gallium9. the second one, and the one you could hep is around wine itself.

we use wine x86, and we emulate it, I mean, his libs. that's not the efficient approach that box86 does, the twist, the wrappering. wrappering wine would be pointless if there is not a fork of wine to be maintained over time. wrappering the libs of wine (so this wine fork will be x86 but with wine ARM libs) would require a fork (this in my opinion) bc every change on wine libs could affect that wrappering and break it. it would be cool to know ptitseb opinion here. I am just assuming the problems it could have. I am not a dev, just his tester.

in theory, if "someone" maintain a wine fork like that it could bump the performance a lot, compared to the current strategy we use.

to be used on aarch64 we could use the spacingbat appimage to void multiarch https://github.com/SpacingBat3/box86-appimage and for non gl capable gpus (due drivers), it could be used gl4es.

for me, in conjunction with upcoming box64, that's the path that hangover should take. it's simpler, more efficient, more powerful.

but enough talking, I will share this topic with ptitiseb, he will bring more ideas to the table if he consider that box86 should be something for hangover.

ghost avatar Mar 03 '21 20:03 ghost

If it can be any help, I'm just here to tell you guys Box86 project author has a 64bit version called Box64.

Tarek-Hasan avatar Aug 15 '21 19:08 Tarek-Hasan

If it can be any help, I'm just here to tell you guys Box86 project author has a 64bit version called Box64.

Yes, didn't notice before your comment. So here are my results with it: https://github.com/AndreRH/hangover/blob/master/benchmarks/readme.md

YMMV

AndreRH avatar Sep 13 '21 20:09 AndreRH

FWIW, the box86 dev stated that they work on box32, aka 32bit programs on arm64. Maybe you can share ideas on how to do the wrapping?

DarkShadow44 avatar Nov 30 '22 01:11 DarkShadow44

Based on the README changes in afee8d64ead86a787d3efa014d15d84a55e7e9e3, looks like using Box86 is on the roadmap.

JeremyRand avatar Feb 15 '23 01:02 JeremyRand

Hey @AndreRH, how about updated benchmark with FEX-Emu and Hangover-Next, so we can see the performance improvements by using new methods.

Tarek-Hasan avatar Feb 18 '23 03:02 Tarek-Hasan

@Tarek-Hasan In readme you can see that FEX and Box32 are on TODO list, so they are not yet implemented.

@AndreRH Btw, I don't think that you need to wait for Box32 (that will handle aarch64 => i386), you can use Box64 (aarch64 => amd64) for 64-bit apps and probably try to get WoW64 using it (now it just crashes when Wine is trying to use it).

Heasterian avatar Feb 18 '23 11:02 Heasterian

I don't think that you need to wait for Box32 (that will handle aarch64 => i386), you can use Box64 (aarch64 => amd64) for 64-bit apps and probably try to get WoW64 using it (now it just crashes when Wine is trying to use it).

WoW64 doesn't help here, box64 can't run 32bit x86 code.

DarkShadow44 avatar Feb 18 '23 16:02 DarkShadow44

WoW64 isn't made for 64-bit emulation, only 32-bit, see https://github.com/AndreRH/hangover/discussions/134#discussioncomment-5202094

AndreRH avatar Mar 04 '23 14:03 AndreRH

Based on the README changes in https://github.com/AndreRH/hangover/commit/afee8d64ead86a787d3efa014d15d84a55e7e9e3, looks like using Box86 is on the roadmap.

That would mean losing any hope of ppc64le support. Hopefully it would still be possible to fallback to qemu.

darkbasic avatar Mar 10 '23 08:03 darkbasic

Based on the README changes in afee8d6, looks like using Box86 is on the roadmap.

That would mean losing any hope of ppc64le support. Hopefully it would still be possible to fallback to qemu.

Qemu will stay in parallel to other emulators as soon as they get added

AndreRH avatar Mar 11 '23 17:03 AndreRH

I consider this fixed, as we have Box64 now :)

AndreRH avatar Sep 16 '23 14:09 AndreRH