Retro-go on the ESP32-P4
After a day of trouble I finally got full retro-go working on the ESP32-P4 so I decided to share my codebase for others. First of all, you might be wondering "are the performances better?" and yes they are ! I noticed great improvements from the ESP32 (S3):
- GENESIS runs at full speed 60fps with all the chips enabled (with the games I tested)
- with audio disabled SNES runs at full speed 55/60fps with frameskip at "2" and audio disabled (with the games I tested)
- with audio enabled SNES still struggle and will runs at 80% speed 40fps with frameskip at "2" (with the games I tested)
- GBA: for example MarioKart SuperCircuit runs at about 75% speed with frameskip at "5" (10fps) pokemon emerald will runs at full speed with frameskip "5" (10fps) most of the time (in the city for example) and busy at 90%. We might be able to get down to frameskip "4" at 400mhz. So almost playable.
Keep in mind this was tested at 360mhz, performance should be a little bit better when 400mhz chips will be available. Also these are just some vague aproximations based on a few tests, I would recommend you to test your own games to get a better idea of the exact performance to expect.
Anyway, if you are interested, here is the branch for the P4 on my fork (it was built using ESP-IDF v5.5-rc1): https://github.com/rapha-tech/retro-go/tree/ESP32-P4-clean Instructions for building are in the README inside the target's folder: https://github.com/rapha-tech/retro-go/tree/ESP32-P4-clean/components/retro-go/targets/esp32-p4
Here is a quick demo video on GBA: https://youtu.be/vBlkORBRjsY and SNES: https://youtu.be/XWoIEtMfELU
I also decided to merge the GBA emulator for easier benchmarking/testing.
This fork is just a quick draft, lot of code for regular devices have been removed but I will probably make a PR later with clean #ifdef.
Thats awesome, I was wondering the other day if the P4 could be compiled without too much work. Looks very promising!
Thanks! the other day I saw a guy running the SNES emulator from retro-go (with some tweaks) on the P4 and it seems to run perfectly. https://www.youtube.com/watch?v=ojtIx0t9SMg At least from my testing it runs better than my simple adaptation of the original code. So maybe some improvements could be made in retro-go to achieve the same level of performance as the guy.
Yeah watched that video a while back and it really does show the potential of the P4, would be amazing to see some almost full speed SNES and GBA with RetroGo. I was a Sega guy growing up so full speed MegaDrive is amazing!
Assuming the youtube guy didn't modify my snes9x port too much and you use similar sdkconfig, then it's safe to assume that libretro-go is probably our bottleneck here. Presumably rg_display and/or rg_audio stuff being too wasteful? His approach to scaling/blitting is definitely much much faster than whatever I do in rg_display but does it fully explain the difference?
Of the top of my head you can try disabling filtering, scaling, and setting RG_SCREEN_PARTIAL_UPDATES 0 in your config.h.
Maybe we should consider adding a faster rendering path for high performance apps that would support only integer scaling, no filtering, and no partial updates.
Very interesting possibilities with the P4!
Maybe we should consider adding a faster rendering path for high performance apps that would support only integer scaling, no filtering, and no partial updates.
Thats interesting, I'm assuming this would also benefit the OG ESP32 and S3 with MegaDrive and SNES? It feels like stripping out some of the 'nice to have' stuff to benefit performance wouldnt be a bad thing! imo, even if sound was disabled by default to get closer to playable would also be a suitable compromise, obviously imo.
From my testing with MD, its a little inconsistent and I'm not exactly sure what happens behind the scenes e.g. my usual test is Street of Rage 2, load up a game, plays fairly slow probably 60-70% speed (as shown by the pause menu), go into the menu, disable the sound and sometimes it'll then play fairly close to full speed with some frameskip but other times it'll actually play too fast like 130-150% speed. If I disable sound completely and load up a game, there wont be any sound but it'll still play at 60-70% speed. Sorry, not trying to hijack the post and happy to open a new issue if you want, didnt wanna bombard you with issues, esp for a system that may never be perfect on the current hardware however the P4 may change that!
Yes the P4 is so much capable they even managed to run quake 1 on it!
For SNES emulation I tried disabling filtering and scaling and it didn't changed a thing. So I believe libretro-go isn't our bottleneck here.
Also, I benchmarked Mariokart while he tried Mario world which is easier to runs hence the (very) big difference in performance.
With mario world:
BUSY:64%, FPS:50 (25+25+0), BATT:0
BUSY:64%, FPS:50 (25+18+6), BATT:0
BUSY:63%, FPS:50 (24+16+10), BATT:0
BUSY:63%, FPS:50 (26+3+22), BATT:0
BUSY:62%, FPS:50 (25+0+25), BATT:0
I get a reasonable frameskip of "1" but as you can see, we're only using 60/65% of the power : but still too much to disable frameskip completly.
Additionnaly, after tinkering I found out what were the 2 main differences:
- He didn't used "fixed" frameskip, instead he just skip some frames there and there allowing the game to runs at a float ish frameskip. And this is why when running games I was only using 60-80% of the power of the P4 with a fixed frameskip of 1 or 2.
- He used the 2 cores just like retro-go but did it a little differently. Basically, instead of running the whole emulator on one core he offloaded the
S9xMixSamples()function to the "I/O core" taht's why he can achieve this level of performance with sound enabled!
After seeing that I tried implementing it in main_snes.c
I also disabled RG_SCREEN_PARTIAL_UPDATES as you said.
I think the numbers speak for themselves... Mario world (Europe) with sound enabled on core 1:
BUSY:100%, FPS:50 (6+0+43) starting the game
BUSY:100%, FPS:49 (8+0+41)
BUSY:100%, FPS:50 (5+0+46) Intro
BUSY:100%, FPS:49 (5+0+45)
BUSY:100%, FPS:50 (3+0+46)
BUSY:100%, FPS:50 (3+0+47)
BUSY:100%, FPS:59 (0+0+59)
BUSY:100%, FPS:52 (3+0+49) Hit B to start
BUSY:100%, FPS:53 (0+0+53)
BUSY:100%, FPS:54 (0+0+54)
BUSY:100%, FPS:54 (0+0+54) First text box / tutorial
BUSY:100%, FPS:54 (0+0+55)
BUSY:100%, FPS:54 (0+0+53)
BUSY:100%, FPS:54 (0+0+54)
BUSY:100%, FPS:55 (0+0+55)
BUSY:100%, FPS:55 (0+0+55)
BUSY:100%, FPS:54 (0+0+54)
BUSY:100%, FPS:55 (0+0+55)
BUSY:100%, FPS:56 (0+0+56)
BUSY:100%, FPS:59 (0+0+59)
BUSY:100%, FPS:59 (0+0+59)
BUSY:100%, FPS:56 (1+0+55)
BUSY:100%, FPS:50 (6+0+44)
BUSY:100%, FPS:50 (7+0+43) Choose level
BUSY:100%, FPS:50 (7+0+43)
BUSY:100%, FPS:54 (4+0+50)
BUSY:100%, FPS:60 (0+0+60)
BUSY:100%, FPS:54 (3+0+51) Level starts
BUSY:100%, FPS:50 (2+0+49)
BUSY:100%, FPS:50 (2+0+47)
BUSY:100%, FPS:50 (3+0+47)
BUSY:100%, FPS:50 (2+0+48)
BUSY:100%, FPS:50 (3+0+47)
BUSY:100%, FPS:50 (5+0+46)
BUSY:100%, FPS:50 (5+0+44)
BUSY:100%, FPS:50 (4+0+46)
BUSY:100%, FPS:50 (2+0+48)
BUSY:100%, FPS:49 (1+0+48)
BUSY:100%, FPS:51 (2+0+49)
BUSY:100%, FPS:50 (0+0+51)
BUSY:100%, FPS:50 (0+0+49)
BUSY:100%, FPS:50 (1+0+49)
BUSY:100%, FPS:50 (2+0+49)
BUSY:100%, FPS:50 (3+0+47)
BUSY:100%, FPS:50 (2+0+48)
BUSY:100%, FPS:50 (3+0+46)
BUSY:68%, FPS:42 (10+0+33) Exiting
BUSY:0%, FPS:50 (50+0+0)
BUSY:0%, FPS:50 (50+0+0)
This is the code I used: I just asked claude AI to modify the existing code and draw inspiration from the youtube guys's main_snes. This code "mostly" works but it cut rg_display_sync() and we get some weird graphical glitches in mariokart's main menu for example (not in a race though). Also sound is very choppy. This code is definitely not ready to be merged but I feel like this is the way to go to get the most out of the ESP32 P4.
#include "shared.h"
#include <snes9x.h>
#include <math.h>
typedef struct
{
char name[16];
struct {
uint16_t snes9x_mask;
uint16_t local_mask;
uint16_t mod_mask;
} keys[16];
} keymap_t;
static const keymap_t KEYMAPS[] = {
{"Type A", {
{SNES_A_MASK, RG_KEY_A, 0},
{SNES_B_MASK, RG_KEY_B, 0},
{SNES_X_MASK, RG_KEY_START, 0},
{SNES_Y_MASK, RG_KEY_SELECT, 0},
{SNES_TL_MASK, RG_KEY_B, RG_KEY_MENU},
{SNES_TR_MASK, RG_KEY_A, RG_KEY_MENU},
{SNES_START_MASK, RG_KEY_START, RG_KEY_MENU},
{SNES_SELECT_MASK, RG_KEY_SELECT, RG_KEY_MENU},
{SNES_UP_MASK, RG_KEY_UP, 0},
{SNES_DOWN_MASK, RG_KEY_DOWN, 0},
{SNES_LEFT_MASK, RG_KEY_LEFT, 0},
{SNES_RIGHT_MASK, RG_KEY_RIGHT, 0},
}},
{"Type B", {
{SNES_A_MASK, RG_KEY_START, 0},
{SNES_B_MASK, RG_KEY_A, 0},
{SNES_X_MASK, RG_KEY_SELECT, 0},
{SNES_Y_MASK, RG_KEY_B, 0},
{SNES_TL_MASK, RG_KEY_B, RG_KEY_MENU},
{SNES_TR_MASK, RG_KEY_A, RG_KEY_MENU},
{SNES_START_MASK, RG_KEY_START, RG_KEY_MENU},
{SNES_SELECT_MASK, RG_KEY_SELECT, RG_KEY_MENU},
{SNES_UP_MASK, RG_KEY_UP, 0},
{SNES_DOWN_MASK, RG_KEY_DOWN, 0},
{SNES_LEFT_MASK, RG_KEY_LEFT, 0},
{SNES_RIGHT_MASK, RG_KEY_RIGHT, 0},
}},
{"Type C", {
{SNES_A_MASK, RG_KEY_A, 0},
{SNES_B_MASK, RG_KEY_B, 0},
{SNES_X_MASK, 0, 0},
{SNES_Y_MASK, 0, 0},
{SNES_TL_MASK, 0, 0},
{SNES_TR_MASK, 0, 0},
{SNES_START_MASK, RG_KEY_START, 0},
{SNES_SELECT_MASK, RG_KEY_SELECT, 0},
{SNES_UP_MASK, RG_KEY_UP, 0},
{SNES_DOWN_MASK, RG_KEY_DOWN, 0},
{SNES_LEFT_MASK, RG_KEY_LEFT, 0},
{SNES_RIGHT_MASK, RG_KEY_RIGHT, 0},
}},
};
static const size_t KEYMAPS_COUNT = (sizeof(KEYMAPS) / sizeof(keymap_t));
static const char *SNES_BUTTONS[] = {
"None", "None", "None", "None", "R", "L", "X", "A", "Right", "Left", "Down", "Up", "Start", "Select", "Y", "B"
};
#define AUDIO_LOW_PASS_RANGE ((60 * 65536) / 100)
// Frame timing constants (from second code)
#define TARGET_FPS (50)
#define TARGET_FRAME_DURATION (1000000 / TARGET_FPS)
static rg_app_t *app;
static rg_surface_t *updates[2];
static rg_surface_t *currentUpdate;
static rg_audio_sample_t *audioBuffer;
// Multi-core audio variables using RG_mutex system
static rg_mutex_t *audio_mutex;
static rg_task_t *audio_task_handle;
static rg_audio_sample_t *audio_buffers[2];
static volatile bool audio_ready = false;
static volatile bool audio_shutdown = false;
static bool apu_enabled = true;
static bool lowpass_filter = false;
static int keymap_id = 0;
static keymap_t keymap;
// Frame timing variables
static int32_t framedrop_balance = 0;
static uint32_t framedrop_timer = 0;
static const char *SETTING_KEYMAP = "keymap";
static const char *SETTING_APU_EMULATION = "apu";
// --- MAIN
static void update_keymap(int id)
{
keymap_id = id % KEYMAPS_COUNT;
keymap = KEYMAPS[keymap_id];
}
static bool screenshot_handler(const char *filename, int width, int height)
{
return rg_surface_save_image_file(currentUpdate, filename, width, height);
}
static bool save_state_handler(const char *filename)
{
return S9xSaveState(filename);
}
static bool load_state_handler(const char *filename)
{
return S9xLoadState(filename);
}
static bool reset_handler(bool hard)
{
S9xReset();
return true;
}
static void event_handler(int event, void *arg)
{
if (event == RG_EVENT_REDRAW)
{
rg_display_submit(currentUpdate, 0);
}
}
// Audio processing task running on separate core
static void audio_task(void *parameter)
{
int audio_buffer_index = 0;
while (!audio_shutdown)
{
// Wait for signal from main emulation loop with timeout
if (rg_mutex_take(audio_mutex, 50)) // 50ms timeout
{
if (audio_ready && apu_enabled)
{
// Process audio samples
if (lowpass_filter)
S9xMixSamplesLowPass((void *)audio_buffers[audio_buffer_index],
AUDIO_BUFFER_LENGTH << 1, AUDIO_LOW_PASS_RANGE);
else
S9xMixSamples((void *)audio_buffers[audio_buffer_index],
AUDIO_BUFFER_LENGTH << 1);
// Submit audio buffer
rg_audio_submit(audio_buffers[audio_buffer_index], AUDIO_BUFFER_LENGTH);
// Switch to next buffer
audio_buffer_index = (audio_buffer_index + 1) % 2;
audio_ready = false;
}
rg_mutex_give(audio_mutex);
}
rg_task_delay(1);
}
}
static rg_gui_event_t apu_toggle_cb(rg_gui_option_t *option, rg_gui_event_t event)
{
if (event == RG_DIALOG_PREV || event == RG_DIALOG_NEXT)
{
apu_enabled = !apu_enabled;
rg_settings_set_number(NS_APP, SETTING_APU_EMULATION, apu_enabled);
}
strcpy(option->value, apu_enabled ? _("On") : _("Off"));
return RG_DIALOG_VOID;
}
static rg_gui_event_t lowpass_filter_cb(rg_gui_option_t *option, rg_gui_event_t event)
{
if (event == RG_DIALOG_PREV || event == RG_DIALOG_NEXT)
lowpass_filter = !lowpass_filter;
strcpy(option->value, lowpass_filter ? _("On") : _("Off"));
return RG_DIALOG_VOID;
}
static rg_gui_event_t change_keymap_cb(rg_gui_option_t *option, rg_gui_event_t event)
{
if (event == RG_DIALOG_PREV || event == RG_DIALOG_NEXT)
{
if (event == RG_DIALOG_PREV && --keymap_id < 0)
keymap_id = KEYMAPS_COUNT - 1;
if (event == RG_DIALOG_NEXT && ++keymap_id > KEYMAPS_COUNT - 1)
keymap_id = 0;
update_keymap(keymap_id);
rg_settings_set_number(NS_APP, SETTING_KEYMAP, keymap_id);
return RG_DIALOG_REDRAW;
}
if (event == RG_DIALOG_ENTER)
{
return RG_DIALOG_CANCEL;
}
if (option->arg == -1)
{
strcat(strcat(strcpy(option->value, "< "), keymap.name), " >");
}
else if (option->arg >= 0)
{
int local_button = keymap.keys[option->arg].local_mask;
int mod_button = keymap.keys[option->arg].mod_mask;
int snes9x_button = log2(keymap.keys[option->arg].snes9x_mask); // convert bitmask to bit number
if (snes9x_button < 4 || (local_button & (RG_KEY_UP|RG_KEY_DOWN|RG_KEY_LEFT|RG_KEY_RIGHT)))
{
option->flags = RG_DIALOG_FLAG_HIDDEN;
return RG_DIALOG_VOID;
}
if (keymap.keys[option->arg].mod_mask)
sprintf(option->value, "%s + %s", rg_input_get_key_name(mod_button), rg_input_get_key_name(local_button));
else
sprintf(option->value, "%s", rg_input_get_key_name(local_button));
option->label = SNES_BUTTONS[snes9x_button];
option->flags = RG_DIALOG_FLAG_NORMAL;
}
return RG_DIALOG_VOID;
}
static rg_gui_event_t menu_keymap_cb(rg_gui_option_t *option, rg_gui_event_t event)
{
if (event == RG_DIALOG_ENTER)
{
const rg_gui_option_t options[] = {
{-1, _("Profile"), "-", RG_DIALOG_FLAG_NORMAL, &change_keymap_cb},
{-2, "", NULL, RG_DIALOG_FLAG_MESSAGE, NULL},
{-3, "snes9x ", "handheld", RG_DIALOG_FLAG_MESSAGE, NULL},
{0, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{1, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{2, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{3, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{4, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{5, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{6, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{7, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{8, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{9, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{10, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{11, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{12, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{13, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{14, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{15, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
RG_DIALOG_END,
};
rg_gui_dialog(option->label, options, 0);
return RG_DIALOG_REDRAW;
}
strcpy(option->value, keymap.name);
return RG_DIALOG_VOID;
}
bool S9xInitDisplay(void)
{
GFX.Pitch = SNES_WIDTH * 2;
GFX.ZPitch = SNES_WIDTH;
GFX.Screen = currentUpdate->data;
GFX.SubScreen = malloc(GFX.Pitch * SNES_HEIGHT_EXTENDED);
GFX.ZBuffer = malloc(GFX.ZPitch * SNES_HEIGHT_EXTENDED);
GFX.SubZBuffer = malloc(GFX.ZPitch * SNES_HEIGHT_EXTENDED);
return GFX.Screen && GFX.SubScreen && GFX.ZBuffer && GFX.SubZBuffer;
}
void S9xDeinitDisplay(void)
{
}
uint32_t S9xReadJoypad(int32_t port)
{
if (port != 0)
return 0;
uint32_t joystick = rg_input_read_gamepad();
uint32_t joypad = 0;
for (int i = 0; i < RG_COUNT(keymap.keys); ++i)
{
uint32_t bitmask = keymap.keys[i].local_mask | keymap.keys[i].mod_mask;
if (bitmask && bitmask == (joystick & bitmask))
{
joypad |= keymap.keys[i].snes9x_mask;
}
}
return joypad;
}
bool S9xReadMousePosition(int32_t which1, int32_t *x, int32_t *y, uint32_t *buttons)
{
return false;
}
bool S9xReadSuperScopePosition(int32_t *x, int32_t *y, uint32_t *buttons)
{
return false;
}
bool JustifierOffscreen(void)
{
return true;
}
void JustifierButtons(uint32_t *justifiers)
{
(void)justifiers;
}
#ifdef USE_BLARGG_APU
static void S9xAudioCallback(void)
{
S9xFinalizeSamples();
size_t available_samples = S9xGetSampleCount();
S9xMixSamples((void *)audioBuffer, available_samples);
rg_audio_submit(audioBuffer, available_samples >> 1);
}
#endif
static void options_handler(rg_gui_option_t *dest)
{
*dest++ = (rg_gui_option_t){0, _("Audio enable"), "-", RG_DIALOG_FLAG_NORMAL, &apu_toggle_cb};
*dest++ = (rg_gui_option_t){0, _("Audio filter"), "-", RG_DIALOG_FLAG_NORMAL, &lowpass_filter_cb};
*dest++ = (rg_gui_option_t){0, _("Controls"), "-", RG_DIALOG_FLAG_NORMAL, &menu_keymap_cb};
*dest++ = (rg_gui_option_t)RG_DIALOG_END;
}
void snes_main(void)
{
const rg_handlers_t handlers = {
.loadState = &load_state_handler,
.saveState = &save_state_handler,
.reset = &reset_handler,
.screenshot = &screenshot_handler,
.event = &event_handler,
.options = &options_handler,
};
app = rg_system_reinit(AUDIO_SAMPLE_RATE, &handlers, NULL);
apu_enabled = rg_settings_get_number(NS_APP, SETTING_APU_EMULATION, 1);
updates[0] = rg_surface_create(SNES_WIDTH, SNES_HEIGHT_EXTENDED, RG_PIXEL_565_LE, 0);
updates[0]->height = SNES_HEIGHT;
currentUpdate = updates[0];
audioBuffer = (rg_audio_sample_t *)malloc(AUDIO_BUFFER_LENGTH * 4);
// Initialize dual audio buffers for multi-core processing
audio_buffers[0] = (rg_audio_sample_t *)malloc(AUDIO_BUFFER_LENGTH * 4);
audio_buffers[1] = (rg_audio_sample_t *)malloc(AUDIO_BUFFER_LENGTH * 4);
// Create mutex for audio synchronization
audio_mutex = rg_mutex_create();
if (audio_mutex == NULL) {
RG_PANIC("Failed to create audio mutex!");
}
update_keymap(rg_settings_get_number(NS_APP, SETTING_KEYMAP, 0));
Settings.CyclesPercentage = 100;
Settings.H_Max = SNES_CYCLES_PER_SCANLINE;
Settings.FrameTimePAL = 20000;
Settings.FrameTimeNTSC = 16667;
Settings.ControllerOption = SNES_JOYPAD;
Settings.HBlankStart = (256 * Settings.H_Max) / SNES_HCOUNTER_MAX;
Settings.SoundPlaybackRate = AUDIO_SAMPLE_RATE;
Settings.SoundInputRate = AUDIO_SAMPLE_RATE;
Settings.DisableSoundEcho = false;
Settings.InterpolatedSound = true;
if (!S9xInitDisplay())
RG_PANIC("Display init failed!");
if (!S9xInitMemory())
RG_PANIC("Memory init failed!");
if (!S9xInitAPU())
RG_PANIC("APU init failed!");
if (!S9xInitSound(0, 0))
RG_PANIC("Sound init failed!");
if (!S9xInitGFX())
RG_PANIC("Graphics init failed!");
const char *filename = app->romPath;
if (rg_extension_match(filename, "zip"))
{
if (!rg_storage_unzip_file(filename, NULL, (void **)&Memory.ROM, &Memory.ROM_AllocSize, RG_FILE_USER_BUFFER))
RG_PANIC("ROM file unzipping failed!");
filename = NULL;
}
if (!LoadROM(filename))
RG_PANIC("ROM loading failed!");
#ifdef USE_BLARGG_APU
S9xSetSamplesAvailableCallback(S9xAudioCallback);
#else
S9xSetPlaybackRate(Settings.SoundPlaybackRate);
#endif
// Create audio task on a separate core using RG task system
audio_task_handle = rg_task_create("snes_audio", audio_task, NULL, 4096, RG_TASK_PRIORITY_1, 1);
if (audio_task_handle == NULL) {
RG_PANIC("Failed to create audio task!");
}
if (app->bootFlags & RG_BOOT_RESUME)
{
rg_emu_load_state(app->saveSlot);
}
rg_system_set_tick_rate(Memory.ROMFramesPerSecond);
int target_frametime = (1000000 / Memory.ROMFramesPerSecond);
app->frameskip = 0;
bool menuCancelled = false;
bool menuPressed = false;
int skipFrames = 0;
// Initialize frame timing
framedrop_timer = rg_system_timer();
while (1)
{
uint32_t joystick = rg_input_read_gamepad();
if (menuPressed && !(joystick & RG_KEY_MENU))
{
if (!menuCancelled)
{
rg_task_delay(50);
rg_gui_game_menu();
}
menuCancelled = false;
}
else if (joystick & RG_KEY_OPTION)
{
rg_gui_options_menu();
}
menuPressed = joystick & RG_KEY_MENU;
if (menuPressed && joystick & ~RG_KEY_MENU)
{
menuCancelled = true;
}
int64_t startTime = rg_system_timer();
bool drawFrame = (skipFrames == 0);
bool slowFrame = false;
// Frame dropping logic (from second code)
IPPU.RenderThisFrame = drawFrame;
framedrop_balance += (rg_system_timer() - framedrop_timer) - target_frametime;
framedrop_timer = rg_system_timer();
if (framedrop_balance < 550) // A little more to not accidentally trigger framedrop by calculation inaccuracies
framedrop_balance = 0;
if (framedrop_balance > target_frametime) {
// We're now a whole frame behind, so skip the next frame
IPPU.RenderThisFrame = false;
skipFrames = 1;
}
GFX.Screen = currentUpdate->data;
S9xMainLoop();
if (drawFrame && IPPU.RenderThisFrame)
{
rg_display_submit(currentUpdate, 0);
}
// Signal audio task to process samples using mutex
if (rg_mutex_take(audio_mutex, 5)) // 5ms timeout
{
audio_ready = true;
rg_mutex_give(audio_mutex);
}
rg_system_tick(rg_system_timer() - startTime);
if (skipFrames > 0)
{
skipFrames--;
}
}
// Cleanup on exit
audio_shutdown = true;
if (audio_mutex) {
rg_mutex_free(audio_mutex);
}
}
Maybe we should consider adding a faster rendering path for high performance apps that would support only integer scaling, no filtering, and no partial updates.
Thats interesting, I'm assuming this would also benefit the OG ESP32 and S3 with MegaDrive and SNES? It feels like stripping out some of the 'nice to have' stuff to benefit performance wouldnt be a bad thing! imo, even if sound was disabled by default to get closer to playable would also be a suitable compromise, obviously imo.
From my testing with MD, its a little inconsistent and I'm not exactly sure what happens behind the scenes e.g. my usual test is Street of Rage 2, load up a game, plays fairly slow probably 60-70% speed (as shown by the pause menu), go into the menu, disable the sound and sometimes it'll then play fairly close to full speed with some frameskip but other times it'll actually play too fast like 130-150% speed. If I disable sound completely and load up a game, there wont be any sound but it'll still play at 60-70% speed. Sorry, not trying to hijack the post and happy to open a new issue if you want, didnt wanna bombard you with issues, esp for a system that may never be perfect on the current hardware however the P4 may change that!
Hi dynamight, From what I know on SNES, MD and basically all apps except PCE and DOOM the whole system emulation runs on core 0 while core 1 handles I/O functions like display (scaling, filtering, SPI...), input buttons, DAC, rg_system monitor... On SNES and MD our main bottleneck is core 0 which runs the emulator so disabling core 1 functions like scaling probably won't help with performance.
What might help would be to offload some of the chip emulation to the I/O core. (1) (just like the youtube guy did with sound emulation on SNES) but I don't know if it is possible with MD. I haven't seen anyone done it. Plus I have no clue about what "busy %" we're at on the I/O core currently.
I'm not 100% sure about all that so @ducalex please correct me if I'm wrong.
Hi
Yeah that makes sense now, I wasnt sure how it worked previously, I know from previous ESP32 projects I've seen such as the RetroTV's it was very important to maximise both cores for smooth operation so would totally make sense with this too and sounds like core 1 probably isnt doing as much as poor old core 0 hah
I'd guess the obvious would be sound being offloaded to core 1 as well as keeping the other stuff, I'm deffo no coder so no idea how tricky that would be!
The improvements to the SNES speed also look impressive!
I've tried many times to get sound to run on the second core but I always end up with synchronization issues and the games would crash. I also tried doing the graphics rendering on the second core which has the same issues. Eventually I gave up.
But I'll try your code, maybe the AI is smarter :) and also I'll look more closely at what fcipaq did. It's definitely an improvement worth merging if it works.
For SNES emulation I tried disabling filtering and scaling and it didn't changed a thing. So I believe libretro-go isn't our bottleneck here.
I've done some tests (with partial update disabled to get a more stable reading) in SNES.
Scaling and filtering enabled: Blit time is 26ms Scaling and filtering disabled: Blit time is 21ms Rewrote write_update to send the raw buffer to the display: 21ms
21ms to transfer ~122880 bytes, so about 44Mbps which is about what you'd expect for a 40Mhz SPI display.
So the pixel manipulation code is likely fine, that's good news to me because I spent so much time on it over the years to make it fast...
So as long as the snes audio task has higher priority than the display task, it should be able to keep running during the display transfer and use most of the processing power of the second core (but it absolutely needs to yield every so often otherwise display task would never run).
Yes, this would be a nice improvement that could benefit all other targets, even if they unfortunately still won't be able to play the SNES at full speed.
Also I probably found one of the issues causing the choppy audio: fcipaq used 2 buffers each containing 5 "audio frames" while the AI code only uses 2 buffer each containing a single "audio frame"
Okay, so after tinkering even more with the AI, I came up with this code:
I added back rg_display_sync() and the audio_task has been modified, now the sound isn't choppy anymore and the screen is in sync! I also added double buffering but I don't know if that changed a thing Currently the only issues are these:
- When the menu is opened and closed after a few seconds, the audio task try to spit out as fast as possible all the samples
- On certain games like f-zero and zelda a link to the past there is a weird line of ~20px at the bottom of the screen
#include "shared.h"
#include <snes9x.h>
#include <math.h>
typedef struct
{
char name[16];
struct {
uint16_t snes9x_mask;
uint16_t local_mask;
uint16_t mod_mask;
} keys[16];
} keymap_t;
static const keymap_t KEYMAPS[] = {
{"Type A", {
{SNES_A_MASK, RG_KEY_A, 0},
{SNES_B_MASK, RG_KEY_B, 0},
{SNES_X_MASK, RG_KEY_START, 0},
{SNES_Y_MASK, RG_KEY_SELECT, 0},
{SNES_TL_MASK, RG_KEY_B, RG_KEY_MENU},
{SNES_TR_MASK, RG_KEY_A, RG_KEY_MENU},
{SNES_START_MASK, RG_KEY_START, RG_KEY_MENU},
{SNES_SELECT_MASK, RG_KEY_SELECT, RG_KEY_MENU},
{SNES_UP_MASK, RG_KEY_UP, 0},
{SNES_DOWN_MASK, RG_KEY_DOWN, 0},
{SNES_LEFT_MASK, RG_KEY_LEFT, 0},
{SNES_RIGHT_MASK, RG_KEY_RIGHT, 0},
}},
{"Type B", {
{SNES_A_MASK, RG_KEY_START, 0},
{SNES_B_MASK, RG_KEY_A, 0},
{SNES_X_MASK, RG_KEY_SELECT, 0},
{SNES_Y_MASK, RG_KEY_B, 0},
{SNES_TL_MASK, RG_KEY_B, RG_KEY_MENU},
{SNES_TR_MASK, RG_KEY_A, RG_KEY_MENU},
{SNES_START_MASK, RG_KEY_START, RG_KEY_MENU},
{SNES_SELECT_MASK, RG_KEY_SELECT, RG_KEY_MENU},
{SNES_UP_MASK, RG_KEY_UP, 0},
{SNES_DOWN_MASK, RG_KEY_DOWN, 0},
{SNES_LEFT_MASK, RG_KEY_LEFT, 0},
{SNES_RIGHT_MASK, RG_KEY_RIGHT, 0},
}},
{"Type C", {
{SNES_A_MASK, RG_KEY_A, 0},
{SNES_B_MASK, RG_KEY_B, 0},
{SNES_X_MASK, 0, 0},
{SNES_Y_MASK, 0, 0},
{SNES_TL_MASK, 0, 0},
{SNES_TR_MASK, 0, 0},
{SNES_START_MASK, RG_KEY_START, 0},
{SNES_SELECT_MASK, RG_KEY_SELECT, 0},
{SNES_UP_MASK, RG_KEY_UP, 0},
{SNES_DOWN_MASK, RG_KEY_DOWN, 0},
{SNES_LEFT_MASK, RG_KEY_LEFT, 0},
{SNES_RIGHT_MASK, RG_KEY_RIGHT, 0},
}},
};
static const size_t KEYMAPS_COUNT = (sizeof(KEYMAPS) / sizeof(keymap_t));
static const char *SNES_BUTTONS[] = {
"None", "None", "None", "None", "R", "L", "X", "A", "Right", "Left", "Down", "Up", "Start", "Select", "Y", "B"
};
#define AUDIO_LOW_PASS_RANGE ((60 * 65536) / 100)
// Frame timing constants (from second code)
#define TARGET_FPS (50)
#define TARGET_FRAME_DURATION (1000000 / TARGET_FPS)
static rg_app_t *app;
static rg_surface_t *updates[2];
static rg_surface_t *currentUpdate;
static rg_audio_sample_t *audioBuffer;
// Multicore audio state
static rg_mutex_t *audio_mutex;
static rg_task_t *audio_task_handle;
static rg_audio_sample_t *audio_buffers[2];
static volatile int audio_buffer_index = 0;
static volatile bool audio_processing = false;
static volatile bool audio_shutdown = false;
static bool apu_enabled = true;
static bool lowpass_filter = false;
static int keymap_id = 0;
static keymap_t keymap;
// Frame timing variables
static int32_t framedrop_balance = 0;
static uint32_t framedrop_timer = 0;
static const char *SETTING_KEYMAP = "keymap";
static const char *SETTING_APU_EMULATION = "apu";
// --- MAIN
static void update_keymap(int id)
{
keymap_id = id % KEYMAPS_COUNT;
keymap = KEYMAPS[keymap_id];
}
static bool screenshot_handler(const char *filename, int width, int height)
{
return rg_surface_save_image_file(currentUpdate, filename, width, height);
}
static bool save_state_handler(const char *filename)
{
return S9xSaveState(filename);
}
static bool load_state_handler(const char *filename)
{
return S9xLoadState(filename);
}
static bool reset_handler(bool hard)
{
S9xReset();
return true;
}
static void event_handler(int event, void *arg)
{
if (event == RG_EVENT_REDRAW)
{
rg_display_submit(currentUpdate, 0);
}
}
// --- AUDIO TASK (CORE 1) ---
static void audio_task(void *arg)
{
while (!audio_shutdown)
{
// Non-blocking check if the main thread has produced a frame
if (audio_processing)
{
// Lock the mutex to safely access shared data
if (rg_mutex_take(audio_mutex, 5))
{
// Re-check flag after acquiring lock
if (audio_processing)
{
if (apu_enabled)
{
// Now we mix the samples from the raw APU buffer
if (lowpass_filter)
S9xMixSamplesLowPass((int16_t*)audio_buffers[audio_buffer_index], AUDIO_BUFFER_LENGTH << 1, (60 * 65536) / 100);
else
S9xMixSamples((int16_t*)audio_buffers[audio_buffer_index], AUDIO_BUFFER_LENGTH << 1);
// Submit the mixed audio to the non-blocking I2S driver
rg_audio_submit(audio_buffers[audio_buffer_index], AUDIO_BUFFER_LENGTH);
}
audio_buffer_index = (audio_buffer_index + 1) % 2;
audio_processing = false; // Signal that we are done
}
rg_mutex_give(audio_mutex);
}
}
rg_task_delay(1); // Yield to prevent watchdog timeout and busy-waiting
}
}
static rg_gui_event_t apu_toggle_cb(rg_gui_option_t *option, rg_gui_event_t event)
{
if (event == RG_DIALOG_PREV || event == RG_DIALOG_NEXT)
{
apu_enabled = !apu_enabled;
rg_settings_set_number(NS_APP, SETTING_APU_EMULATION, apu_enabled);
}
strcpy(option->value, apu_enabled ? _("On") : _("Off"));
return RG_DIALOG_VOID;
}
static rg_gui_event_t lowpass_filter_cb(rg_gui_option_t *option, rg_gui_event_t event)
{
if (event == RG_DIALOG_PREV || event == RG_DIALOG_NEXT)
lowpass_filter = !lowpass_filter;
strcpy(option->value, lowpass_filter ? _("On") : _("Off"));
return RG_DIALOG_VOID;
}
static rg_gui_event_t change_keymap_cb(rg_gui_option_t *option, rg_gui_event_t event)
{
if (event == RG_DIALOG_PREV || event == RG_DIALOG_NEXT)
{
if (event == RG_DIALOG_PREV && --keymap_id < 0)
keymap_id = KEYMAPS_COUNT - 1;
if (event == RG_DIALOG_NEXT && ++keymap_id > KEYMAPS_COUNT - 1)
keymap_id = 0;
update_keymap(keymap_id);
rg_settings_set_number(NS_APP, SETTING_KEYMAP, keymap_id);
return RG_DIALOG_REDRAW;
}
if (event == RG_DIALOG_ENTER)
{
return RG_DIALOG_CANCEL;
}
if (option->arg == -1)
{
strcat(strcat(strcpy(option->value, "< "), keymap.name), " >");
}
else if (option->arg >= 0)
{
int local_button = keymap.keys[option->arg].local_mask;
int mod_button = keymap.keys[option->arg].mod_mask;
int snes9x_button = log2(keymap.keys[option->arg].snes9x_mask); // convert bitmask to bit number
if (snes9x_button < 4 || (local_button & (RG_KEY_UP|RG_KEY_DOWN|RG_KEY_LEFT|RG_KEY_RIGHT)))
{
option->flags = RG_DIALOG_FLAG_HIDDEN;
return RG_DIALOG_VOID;
}
if (keymap.keys[option->arg].mod_mask)
sprintf(option->value, "%s + %s", rg_input_get_key_name(mod_button), rg_input_get_key_name(local_button));
else
sprintf(option->value, "%s", rg_input_get_key_name(local_button));
option->label = SNES_BUTTONS[snes9x_button];
option->flags = RG_DIALOG_FLAG_NORMAL;
}
return RG_DIALOG_VOID;
}
static rg_gui_event_t menu_keymap_cb(rg_gui_option_t *option, rg_gui_event_t event)
{
if (event == RG_DIALOG_ENTER)
{
const rg_gui_option_t options[] = {
{-1, _("Profile"), "-", RG_DIALOG_FLAG_NORMAL, &change_keymap_cb},
{-2, "", NULL, RG_DIALOG_FLAG_MESSAGE, NULL},
{-3, "snes9x ", "handheld", RG_DIALOG_FLAG_MESSAGE, NULL},
{0, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{1, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{2, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{3, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{4, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{5, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{6, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{7, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{8, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{9, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{10, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{11, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{12, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{13, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{14, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
{15, "-", "-", RG_DIALOG_FLAG_HIDDEN, &change_keymap_cb},
RG_DIALOG_END,
};
rg_gui_dialog(option->label, options, 0);
return RG_DIALOG_REDRAW;
}
strcpy(option->value, keymap.name);
return RG_DIALOG_VOID;
}
bool S9xInitDisplay(void)
{
GFX.Pitch = SNES_WIDTH * 2;
GFX.ZPitch = SNES_WIDTH;
GFX.Screen = currentUpdate->data;
GFX.SubScreen = malloc(GFX.Pitch * SNES_HEIGHT_EXTENDED);
GFX.ZBuffer = malloc(GFX.ZPitch * SNES_HEIGHT_EXTENDED);
GFX.SubZBuffer = malloc(GFX.ZPitch * SNES_HEIGHT_EXTENDED);
return GFX.Screen && GFX.SubScreen && GFX.ZBuffer && GFX.SubZBuffer;
}
void S9xDeinitDisplay(void)
{
}
uint32_t S9xReadJoypad(int32_t port)
{
if (port != 0)
return 0;
uint32_t joystick = rg_input_read_gamepad();
uint32_t joypad = 0;
for (int i = 0; i < RG_COUNT(keymap.keys); ++i)
{
uint32_t bitmask = keymap.keys[i].local_mask | keymap.keys[i].mod_mask;
if (bitmask && bitmask == (joystick & bitmask))
{
joypad |= keymap.keys[i].snes9x_mask;
}
}
return joypad;
}
bool S9xReadMousePosition(int32_t which1, int32_t *x, int32_t *y, uint32_t *buttons)
{
return false;
}
bool S9xReadSuperScopePosition(int32_t *x, int32_t *y, uint32_t *buttons)
{
return false;
}
bool JustifierOffscreen(void)
{
return true;
}
void JustifierButtons(uint32_t *justifiers)
{
(void)justifiers;
}
#ifdef USE_BLARGG_APU
static void S9xAudioCallback(void)
{
S9xFinalizeSamples();
size_t available_samples = S9xGetSampleCount();
S9xMixSamples((void *)audioBuffer, available_samples);
rg_audio_submit(audioBuffer, available_samples >> 1);
}
#endif
static void options_handler(rg_gui_option_t *dest)
{
*dest++ = (rg_gui_option_t){0, _("Audio enable"), "-", RG_DIALOG_FLAG_NORMAL, &apu_toggle_cb};
*dest++ = (rg_gui_option_t){0, _("Audio filter"), "-", RG_DIALOG_FLAG_NORMAL, &lowpass_filter_cb};
*dest++ = (rg_gui_option_t){0, _("Controls"), "-", RG_DIALOG_FLAG_NORMAL, &menu_keymap_cb};
*dest++ = (rg_gui_option_t)RG_DIALOG_END;
}
void snes_main(void)
{
const rg_handlers_t handlers = {
.loadState = &load_state_handler,
.saveState = &save_state_handler,
.reset = &reset_handler,
.screenshot = &screenshot_handler,
.event = &event_handler,
.options = &options_handler,
};
app = rg_system_reinit(AUDIO_SAMPLE_RATE, &handlers, NULL);
// Load settings
apu_enabled = rg_settings_get_number(NS_APP, SETTING_APU_EMULATION, 1);
update_keymap(rg_settings_get_number(NS_APP, SETTING_KEYMAP, 0));
// Allocate surfaces and audio buffers
updates[0] = rg_surface_create(SNES_WIDTH, SNES_HEIGHT_EXTENDED, RG_PIXEL_565_LE, 0);
updates[1] = rg_surface_create(SNES_WIDTH, SNES_HEIGHT_EXTENDED, RG_PIXEL_565_LE, 0);
currentUpdate = updates[0];
audio_buffers[0] = malloc(AUDIO_BUFFER_LENGTH * 4);
audio_buffers[1] = malloc(AUDIO_BUFFER_LENGTH * 4);
RG_ASSERT(audio_buffers[0] && audio_buffers[1], "Failed to allocate audio buffers!");
// Set up multicore audio
audio_mutex = rg_mutex_create();
audio_task_handle = rg_task_create("snes_audio", &audio_task, NULL, 2048, RG_TASK_PRIORITY_6, 1);
RG_ASSERT(audio_mutex && audio_task_handle, "Failed to create audio task!");
update_keymap(rg_settings_get_number(NS_APP, SETTING_KEYMAP, 0));
Settings.CyclesPercentage = 100;
Settings.H_Max = SNES_CYCLES_PER_SCANLINE;
Settings.FrameTimePAL = 20000;
Settings.FrameTimeNTSC = 16667;
Settings.ControllerOption = SNES_JOYPAD;
Settings.HBlankStart = (256 * Settings.H_Max) / SNES_HCOUNTER_MAX;
Settings.SoundPlaybackRate = AUDIO_SAMPLE_RATE;
Settings.SoundInputRate = AUDIO_SAMPLE_RATE;
Settings.DisableSoundEcho = false;
Settings.InterpolatedSound = true;
if (!S9xInitDisplay())
RG_PANIC("Display init failed!");
if (!S9xInitMemory())
RG_PANIC("Memory init failed!");
if (!S9xInitAPU())
RG_PANIC("APU init failed!");
if (!S9xInitSound(0, 0))
RG_PANIC("Sound init failed!");
if (!S9xInitGFX())
RG_PANIC("Graphics init failed!");
const char *filename = app->romPath;
if (rg_extension_match(filename, "zip"))
{
if (!rg_storage_unzip_file(filename, NULL, (void **)&Memory.ROM, &Memory.ROM_AllocSize, RG_FILE_USER_BUFFER))
RG_PANIC("ROM file unzipping failed!");
filename = NULL;
}
if (!LoadROM(filename))
RG_PANIC("ROM loading failed!");
#ifdef USE_BLARGG_APU
S9xSetSamplesAvailableCallback(S9xAudioCallback);
#else
S9xSetPlaybackRate(Settings.SoundPlaybackRate);
#endif
if (app->bootFlags & RG_BOOT_RESUME)
{
rg_emu_load_state(app->saveSlot);
}
rg_system_set_tick_rate(Memory.ROMFramesPerSecond);
int target_frametime = (1000000 / Memory.ROMFramesPerSecond);
app->frameskip = 0;
bool menuCancelled = false;
bool menuPressed = false;
int skipFrames = 0;
// Initialize frame timing
framedrop_timer = rg_system_timer();
while (1)
{
uint32_t joystick = rg_input_read_gamepad();
if (menuPressed && !(joystick & RG_KEY_MENU))
{
if (!menuCancelled)
{
rg_task_delay(50);
rg_gui_game_menu();
}
menuCancelled = false;
}
else if (joystick & RG_KEY_OPTION)
{
rg_gui_options_menu();
}
menuPressed = joystick & RG_KEY_MENU;
if (menuPressed && joystick & ~RG_KEY_MENU)
{
menuCancelled = true;
}
int64_t startTime = rg_system_timer();
bool drawFrame = (skipFrames == 0);
bool slowFrame = false;
IPPU.RenderThisFrame = drawFrame;
framedrop_balance += (rg_system_timer() - framedrop_timer) - target_frametime;
framedrop_timer = rg_system_timer();
if (framedrop_balance < 550) // A little more to not accidentally trigger framedrop by calculation inaccuracies
framedrop_balance = 0;
if (framedrop_balance > target_frametime) {
// We're now a whole frame behind, so skip the next frame
IPPU.RenderThisFrame = false;
skipFrames = 1;
}
S9xMainLoop();
if (rg_mutex_take(audio_mutex, 5))
{
audio_processing = true;
rg_mutex_give(audio_mutex);
}
if (IPPU.RenderThisFrame)
{
slowFrame = !rg_display_sync(false);
rg_display_submit(currentUpdate, 0);
currentUpdate = (currentUpdate == updates[0]) ? updates[1] : updates[0];
GFX.Screen = currentUpdate->data;
}
rg_system_tick(rg_system_timer() - startTime);
if (skipFrames > 0)
{
skipFrames--;
}
else if (slowFrame || app->frameskip > 1)
{
skipFrames = slowFrame ? 1 : app->frameskip;
}
}
// Cleanup on exit
audio_shutdown = true;
if (audio_mutex) {
rg_mutex_free(audio_mutex);
}
}
Is that latest code significantly faster than upstream? Is the boost mainly from the audio task or your frameskip handling? Or both?
I would say both help, the audio task allows the game to runs at the same speed as without sound emulation. At least on the 2 targets I tested. For the frameskip it allows the P4 to be used at it's full capacity so it is significantly faster, the results are the same as I said earlier (between 5-45% more speed depending on the game)! The lastest code just improved stability a little bit.
I've now merged some of your ideas to SNES, thanks for the work!
-
Double buffering for display and audio are both included but disabled by default. (I'm pretty sure audio double buffering does nothing because rg_audio_submit creates a copy anyway, and we skip too many frames for double display to be worth it)
-
Audio task is your code but a bit simplified. It is enabled by default because it does provide a decent speed up but we'll have to check if it breaks any games or gets out of sync too much...
I have not yet merged your frame skip code. I can confirm that with it SMW runs full speed even on base esp32 (with an obviously very very choppy display) so I will definitely be looking into either merging it, or improving the frameskip code of retro-go itself.
Let me know if my current code isn't good enough for your use case or if I broke anything :)
The current code is great! I couldn't do any better so I'm happy that this improvement could be merged. Maybe in the future someone will find a solution to have the same performance as the free running task without the games breaking... Still thanks for taking the time to review/upgrade the code!
Improvements sound awesome! Are any of these possible for the MegaDrive emulator? Was always a Sega kid growing up, even got a Saturn trying to convince everyone it was better than the PSX ha :)
With the free running task, the main emulation frequently ends up 20-50 frames ahead of audio which is a big problem. With my current code, the main emulation can get at most 1 frame ahead.
I think ideally we should make the code allow for some wiggle room, like +/-N frames where N is configurable. This would allow us to find the sweet spot between utilising as much CPU as possible and not breaking games by running too far ahead. It would also allow per-game tweaks, because clearly SMW doesn't care too much about sync so that one could be let run freer for example.
For the MegaDrive emulator I have tried several times running the Z80 on core 1. But I faced the same sync issues so I had to give up. It might be worth trying again.
Could also try running SN76489/ym2612 (audio chips) in the core 1 task but not the Z80, maybe it'll cause less sync issues.
Here's a test of a multi-core gwenesis I did a while ago, if you wish to try it. I don't remember if it broke anything, but audio emulation will be less accurate (hard for me to say if it's a problem or not).
https://github.com/ducalex/retro-go/commits/multi-core-gwenesis/
Performance is definitely greatly increased with it, so if you try it and it doesn't break anything I'll merge it to dev.
Hi
Just had a quick test so far, only on Sonic 2 and Streets of Rage 2. It is faster, brings it almost up to levels without any sound, but still with lots of frameskip etc. Sonic is super choppy, not really playable but SOR2 is ok.
If I disable both sound chips and Z80 it actually runs too fast, about 120% speed.
I'll need to do some more testing to make sure theres no major drawbacks. I'll test it on a few devices over the next few days and let you know.
Personally, I'd take fairly(ish!) smooth gameplay without any audio than choppy gameplay and scratchy audio :)
Tested a few more games and also on a ESP32 S3. There is a performance increase however with sound enabled I did get quite a few crashes with Sonic 2.
Performance with the S3 in SOR2 with sound is actually fairly playable, as I mentioned before it does run wayyyy too fast with sound disabled. It feels like if 'speed' was properly regulated at 100%, it would be almost perfect without sound.
Sonic during opening, probably fastest it will run (sound and z80 disabled)
Full speed through a level and still above 100%, does feel pretty smooth, just a little too fast, again no sound.
It is slower than 100% with sound even multicore, but still faster than before.
Personnaly I tested outrun and ran into some graphical gliches here is a picture:
And same as sonic 2 after a few second the game crashes
This was with both sound and z80 enabled.
The branch I shared moves graphics rendering and audio processing to core 1. I'm fairly confident that the crashes are caused by the rendering and not the sound. It's hard to say without a log but I'm guessing the fact that cpu emulation runs at the same time might cause bogus pointers. Or it's simply a stack overflow.
I've added menu options to control what is offloaded to core 1, you can change at any time during game play to compare. Let me know if either of you find a combo that is stable but also improves speed enough to be worth merging to dev!
Hi
Yes rendering on core 1 was causing the graphical glitching in Outrun, disable and the glitches go away. It does bring some performance boost however, enabling rendering on core 1 does help with my two test games, Sonic 2 and SOR2. I didnt get any crashes on Sonic 2 this time either.
Also I saw you tied the game speed to the display rather than sound now, this does keep the game at a maximum of 100%. Although not an ideal scenario, but sound disabled and render on core 1, allows Sonic 2 to play at pretty much full speed (60/1 or 60/2 depending on when you check) disabling rendering on core 1 introduces some more frameskip, usually around 60/4 or 5 and doesnt feel anywhere as smooth. These were done on a original ESP32, I will do the same on a S3 but I'd imagine it will be a little quicker/smoother.
I will run some further tests but as it stands, both on core 1 is faster but rendering will cause problems with certain games.
Also I saw you tied the game speed to the display rather than sound now, this does keep the game at a maximum of 100%.
It's still tied to sound. Before, when sound was disabled, there was no speed control at all. Now when sound is disabled silence is fed to rg_audio_submit instead of never calling it. The alternative is to use busy waits but I suspect it would impact performance more than sending silence. It might be worth measuring, though.
I think I'll merge the new core 1 menu to the dev branch so we can keep experimenting.
Yeah that could be cool, it does help with performance. I would say the best 'default' settings would be On for both sound, Z80 and sound on Core 1, then off for Render as can introduce some artifacts in some games.
It's merged. Sound core 1 enabled, Render core 1 disabled.
I'll add a similar menu to the SNES emulator.