General slowness and fixes.
As you are probably well aware, the play-core in retroarch is too slow to emulate most games. So I was thinking of introducing a frame-skipping mechanism for the time being. However I could not follow the code flow. So if you can have some very basic document / video for the main code paths, would be helpful.
I also did a performance analysis on Linux with perf. Attached as out.zip. There is vector push_back which is taking close to 12% which we can definitely reduce to nothing. Right now I'm recovering from COVID, but maybe in a week or so I'll take that up if you think it's viable.
Also, threads and context switching is taking 5% of the time. Since we are eitherways using 100% cpu, using boost's non-blocking data structures and doing busy wait is probably a better choice here. Will take up if I understand the codeflow better.
Let me know how I can help and please write / record a basic dev documentation.
Forgot to mention, the profiling data is based on the opening video scene of FFX.
Hey there, thanks for this!
I know that one source of problems is the communication between the main thread that processes GIF packets and the GS thread. (https://github.com/jpd002/Play-/blob/master/Source/ee/GIF.cpp#L273)
ProcessSinglePacket processes the GIF packet and collects register writes in a std::vector that will be sent off to the GS thread when the packet has been processed completely (WriteRegisterMassively). The GS threads waits for a condition variable to be signaled and will process the registers writes when the GIF tells it to.
One of the degenerate cases for this is when games sends tons of small GIF packets, causing a lot of chatter between the two threads. I've experimented with buffering (https://github.com/jpd002/Play-/commit/4b5c64824cd88914bb8980def6ed1198e86e18ea), which gives some interesting improvements on some platforms, but there's many edge cases.
I've done some experimenting with busy waiting too, but maybe I didn't do it properly because the results were underwhelming.
There's surely lots of space for improvement, lemme know if you have specific ideas in mind! Don't hesitate if you have more questions too!
Thanks!
Hi,
I did some experiments with replacing the vector.push_back with a writing to a fixed sized vector. That definitely improved the performance in that area as per perf. I did not see any noticeable increase in frame-rates though.
I am trying to see if optimizations are possible in CPS2VM::EmuThread. Any documentation regarding the logic / flow here would be helpful.
I also did a lot of digging around the retro_run area. Two things confuse me here - The mailbox mechanism and the m_flipped flag. I am guessing the m_flipped flag is set somewhere in this path:
...retro::FlipImpl
...GL::FlipImpl
...retro::PresentBackBuffer
g_video_cb // libretro callback
However I am not sure. Can you explain the mailbox mechanism? I think if I understand that I can implement some kind of frame-skipping mechanism.
Why don't you do a video code walkthrough or something? - It will help anyone trying to contribute to the project.
the mailbox specifically is meant as a way to communicate between threads without any locks. aka, if 1 thread needed to pass data or trigger a function on another thread, it will call the mailbox to queue that action on that thread.
for retroarch, the GS thread has to be killed, since context is set on their thread for rendering, and they expect us to produce a single frame everytime they call retro_run, because GS is no longer running on its own thread, all the GS processing can't be done until retro_runis called, and since we need openGL context for that, for that we set check for m_flipped in ProcessSingleFrame() which will process the mailbox, until it calls FlipImpl() which will set the flag to true, indicating 1 frame has been processed, after which we return control back to RA.