fceux icon indicating copy to clipboard operation
fceux copied to clipboard

Win32 SDL/Qt Log Trace to File Has Multiple Major Issues

Open TheRealQuantam opened this issue 1 year ago • 0 comments

Multiple issues occur in the Win32 versions of FCEUX when saving trace logs to file, with symptoms differing based on version. For this issue I'll talk about the SDL/Qt version as that's the one I've been working on a fix for (though the fix can probably be applied to the non-SDL/Qt version as well).

In the SDL/Qt version, only parts of the log are written to disk, resulting in gaps in the log. After some investigation, I found the following problems with the logging code:

  1. The file is opened in no buffering, write-through mode. While this has the potential to be very fast and no buffering avoids polluting the system cache...
  2. The file buffer is 8 KB, which gets written at 4 KB. In combination with 1 this absolutely annihilates performance on HDDs, as it means FCEUX will write 4-8 KB, then the disk must spin a full rev back to the same place before it can write the next 4-8 KB. On a 5400 RPM disk for instance this limits write speed to about 360 KBps, FAR below real-time.
  3. Writes occur when the buffer exceeds 4 KB, and write out the entire contents of the buffer...
  4. The buffer is allocated on the stack. As most writes will be odd sizes (neither 4 KB nor 8 KB) and the buffer ends up where it ends up in RAM, this means most writes will FAIL, as no buffering mode requires write offsets and sizes and RAM buffer addresses to be multiples of the sector size (though RAM buffer location appears to be a requirement that isn't strictly enforced).
  5. No error checking on writes. It is in fact the lack of error checking that makes the SDL/Qt version run so much faster than the non-SDL/Qt version: most writes simply fail immediately, returning right back to FCEUX to continue emulation.
  6. While not as critical as the others, synchronous I/O is bad for performance with streamed data. Use of asynchronous I/O would allow Windows to be writing out one buffer while the logging thread fills the next (additionally, asynchronous + no buffering is the highest performance I/O mode in Windows).

I am working on a fix for these problems as such:

  1. (and 6) The file is opened with no buffering + overlapped (asynchronous)
  2. Increase to a pair of 1 MB buffers. Smaller buffers such as 128 KB or 256 KB would probably suffice, but a larger buffer helps smooth over hiccups due to other apps briefly and periodically accessing the disk.
  3. Writes occur only in multiples of 4 KB (a pretty safe assumption for sector size)
  4. The buffers are allocated with VirtualAlloc, which guarantees 4 KB alignment
  5. Writes are checked for errors to the extent that failures and edge cases are detected (though at this point I don't have a good idea how to respond to write failures)

Additionally, after prototyping those fixes and getting a dramatic speedup of I/O, an additional problem was discovered: 7. The log strings are constructed using sprintf. I don't know how other platforms are, but on Windows with VC++ this function is much slower than can be achieved with other means. By replacing it with custom functions so far I've increased the buffer fill speed to 5x what it was (from 33% of the core at 60 FPS to 7%). Currently it's able to log at ~2x real-time to a consumer-grade HDD. I expect to be able to get it to at least 2.5x by the time I'm done, at which point that will make it I/O-bound on my HDD.

I will continue working on my fix, and I expect being able to make a PR somewhere between today (Sunday) and Tuesday.

TheRealQuantam avatar Feb 19 '24 01:02 TheRealQuantam