tetanes icon indicating copy to clipboard operation
tetanes copied to clipboard

Excessive CPU load

Open mantou132 opened this issue 3 years ago • 9 comments

On my computer:

https://lukeworks.tech/tetanes-web ~60% https://takahirox.github.io/nes-rust/wasm/web/index.html ~30% OpenEmu ~15%

mantou132 avatar Jul 02 '22 21:07 mantou132

Thanks for reporting! I can definitely see some areas for improvement.

lukexor avatar Jul 03 '22 15:07 lukexor

@mantou132 Would you mind giving more details on your results like which browser and OS? I'm getting 50-60% CPU usage on nes-rust as well on Chrome 102.0.5005.115 on macOS Catalina 10.15.7. The CPU profile for each shows that 80% of the CPU time is spent in clocking a NES frame in WASM which each frame lasts around 0.14ms to 0.16ms (e.g. 60 FPS). The JS and GPU are actually doing very little amount of work comparatively.

I think both implementations would benefit from switching to AudioWorklets so that the audio handling is not done from the main thread, but I haven't gotten around to figuring out how to do that yet.

lukexor avatar Jul 04 '22 17:07 lukexor

MacBook Pro (13-inch, 2017, Two Thunderbolt 3 ports) Chrome 105.0.5159.0

My computer performance is relatively poor.

When I run tetanes and nes-rust in the same environment, nes-rust takes less CPU time, maybe there is room for optimization in tatanes, maybe just because tatanes is more powerful.

Attached are two profile files.

Archive.zip

mantou132 avatar Jul 04 '22 18:07 mantou132

Thanks! I'll take a deeper look. I'm working on several performance elements right now even for the native version so hopefully that will be out soon.

TetaNES is meant to be fairly accurate to the original NES, including several odd behaviors that don't directly affect most games. The native version is cycle accurate while WASM is using a catch up method on every CPU instruction to be more accurate. Audio is also downsampled and both low and high pass filtered, so there is certainly a bit more processing than I find in nes-rust.

I know there are gains to be had though, Mesen is what I'm benchmarking against as being more accurate than TetaNES and also more performant.

lukexor avatar Jul 04 '22 19:07 lukexor

https://lukeworks.tech/tetanes-web

When it is less than 60fps, the sound will be stuck(When my computer is busy working or open Devtoos). Can it be solved by expanding the size of Audio Output Buffer?

  • When FPS is between 40-60, the sound should be played normally
  • Sound allows delay 1 frames
  • When the FPS is too low, do not play the sound

mantou132 avatar Jul 06 '22 04:07 mantou132

This affects most emulators. Added to that the web audio API is not well suited to real time generated audio streams.

The NES generates exactly enough audio for 60fps at the chosen sample rate (48000 Hz in TetaNES). Any less fps and there will be gaps, which creates pops in audio. I've tried to compensate for it using dynamic rate control which resamples the audio to get more samples (and changes pitch a little, but not enough to notice). Unfortunately it's not sufficient. Other option is to clock extra frames, which can cause screen tearing.

It's something I'm continually trying to fix, but feel free to create a separate issue for it for tracking

Some links for reference: https://docs.libretro.com/development/cores/dynamic-rate-control/ https://emulation.gametechwiki.com/index.php/Vsync https://forums.nesdev.org/viewtopic.php?t=10048

lukexor avatar Jul 06 '22 04:07 lukexor

Is the memory copy here more expensive than the previous ptr?

mantou132 avatar Jul 08 '22 04:07 mantou132

My thought was it would be faster for wasm to copy than JavaScript, and I noticed nes-rust was doing a similar thing except using a for loop, but now that you mention it, a buffer view into wasm memory was probably the more performant method.

I'll need to do some more benchmarks. Working with the web audio API is tedious. My next steps are to explore removing the current audio processing from wasm (so less for the main thread to process) and instead leveraging the builtin filtering features of web audio and a shared array buffer with an AudioWorklet so it can all be done on a separate thread

lukexor avatar Jul 08 '22 05:07 lukexor

This should be resolved for the upcoming release based on the engine-rework branch. Additional threads with sleeping for os builds and clocking/redrawing only when needed in wasm.

In general TetaNES is going to have a higher CPU use than similar emulators due to cycle accuracy and additional features like rewind.

lukexor avatar Mar 30 '24 17:03 lukexor

This should be much improved in the latest release. It's still higher than some other emulators, especially when cycle accurate and run ahead features are enabled.

This will likely be an ongoing issue. I'm closing this for now in favor of #235 which will greatly reduce overall usage as well.

lukexor avatar May 17 '24 20:05 lukexor