picodrive
picodrive copied to clipboard
large overhaul of SH2 DRC, plus new backends for MIPS32 and A64
Originally I only wanted to learn something about recompilation and looked for a simple target... I think this got way out of hand. I ended up changing things in picodrive all over the place.
However, mainly I made large changes to the DRC for optimisation. It now creates much better code (something slightly above 3 arm32-insns per SH2-insn). And lastly, I added 2 new backends for mipsel (MIPS32R1) and aarch64 (A64) to it. Have a look and take what you think is suitable.
Nice, wasn't expecting anything like this.
I'll want to look at this though, and make sure the targets I care about still work, so with my time it may take quite a while before this is merged.
to be honest, me neither... like I said, it got out of hand.
I forgot to mention that I extended x86-64 support to use the full register set. I'm no x86 afficionado, but it helped me a lot with debugging the DRC changes.
Also notable are the changes to polling detection. I added basic loop detection to the DRC, which enabled me to detect polling on memory addresses, which would otherwise be far too expensive to do. That gave a nice speed boost to some games. And I tried to mend some of the synchronisation problems that were still present by adding a "synchronisation FIFO". This stores values written to known addresses used for synchronisation together with the cycle time of the write to avoid missing a value when several values are written in short time. I say that's a bit experimental, but it may have the potential to make some of the other sync stuff superfluous.
I introduced a "branch cache" in the DRC. This caches lookup results for entry points to speed up branching across tcache buffers. That also brought a noticeable improvement. I also riveted on the memory access functions and did some changes there since I identified these as a bottleneck for the overall speed. And I added some more arm32 asm stuff to speed up things on the armv[4-7] platforms.
Together with all the other changes I did, this code achieves 30-60 fps on most 32x games on a caanoo @800MHz, firmware V3. The aarch64 implementation runs anything I threw at it at 100-200 fps on an Odroid-C2 under ubuntu, which isn't really the fastest horse in the stable. Unfortunately I can't say how fast the mipsel stuff is since I have no real hardware and can only test this in a qemu environment (using https://boards.dingoonity.org/gcw-development/gcw-zero-emulation-in-qemu/). It generates about 10-15% more code than the arm32 version (due to the flags emulation stuff), so I expect it to behave similar to the caanoo or slightly better on a 1GHZ JZ4760.
I think further speedup might be achieved if the polling logic is changed in a way that interrupt don't disrupt the polling state. A high-profile interrupt like pwm can occur about every 1000 cycles. Redetecting the polling state can take more than 200 sh2 cycles, so that can take up to 20% of the emulated cpu. I can't spend much more time with this, but I reckon that would be an interesting route.
Lastly, I should mention that I didn't work on the configure stuff. I used static config.
it doesn't compile with flto enabled :
pico/32x/memory.c:44:1: error: global register variable follows a function definition
44 | DRC_DECLARE_SR;
Also compilation breaks when attempting to run make in tools. The following patch works for me
--- a/Makefile
+++ b/Makefile
@@ -202,10 +202,10 @@
endif
-target_: pico/pico_int_offs.h $(TARGET)
+target_: $(TARGET)
clean:
- $(RM) $(TARGET) $(OBJS) pico/pico_int_offs.h
+ $(RM) $(TARGET) $(OBJS)
$(RM) -r .opk_data
$(TARGET): $(OBJS)
@@ -218,8 +218,8 @@
pprof: platform/linux/pprof.c
$(CC) $(CFLAGS) -O2 -ggdb -DPPROF -DPPROF_TOOL -I../../ -I. $^ -o $@ $(LDFLAGS) $(LDLIBS)
-pico/pico_int_offs.h:: tools/mkoffsets.sh
- make -C tools/ XCC="$(CC)" XCFLAGS="$(CFLAGS)"
+tools/textfilter: tools/textfilter.c
+ make -C tools/ textfilter
.s.o:
$(CC) $(CFLAGS) -c $< -o $@`
The 32X MIPS DRC code doesn't work at all on an Ingenic JZ4760B : it crashes upon booting up any game. The ingenic Jz4760 is MIPS32r1 (non MSA) unlike the JZ4770 which is MIPS32r2 (it does have an FPU though). Disabling the SH2 DRC MIPS code makes it work again but it runs at like 7-8 FPS. Guess it's better than 3-4 FPS... I'm using an LDK/RS-97 for testing.
Hi,
On Sun, 11 Aug 2019, 06:52 gameblabla, [email protected] wrote:
it doesn't compile with flto enabled :
pico/32x/memory.c:44:1: error: global register variable follows a function definition 44 | DRC_DECLARE_SR;
What platform did you compile for? Libretro or standalone?
Also compilation breaks when attempting to run make in tools.
The following patch works for me
--- a/Makefile +++ b/Makefile @@ -202,10 +202,10 @@ endif
-target_: pico/pico_int_offs.h $(TARGET) +target_: $(TARGET)
clean:
- $(RM) $(TARGET) $(OBJS) pico/pico_int_offs.h
- $(RM) $(TARGET) $(OBJS) $(RM) -r .opk_data
$(TARGET): $(OBJS) @@ -218,8 +218,8 @@ pprof: platform/linux/pprof.c $(CC) $(CFLAGS) -O2 -ggdb -DPPROF -DPPROF_TOOL -I../../ -I. $^ -o $@ $(LDFLAGS) $(LDLIBS)
-pico/pico_int_offs.h:: tools/mkoffsets.sh
- make -C tools/ XCC="$(CC)" XCFLAGS="$(CFLAGS)" +tools/textfilter: tools/textfilter.c
- make -C tools/ textfilter
.s.o: $(CC) $(CFLAGS) -c $< -o $@`
I built an automatic mechanism to calculate the offsets in pico_int_offs.h. it's absolutely needed at least once since some offsets have changed. That may also explain why the mips drc doesn't work.
What exactly did fail?
The 32X MIPS DRC code doesn't work at all on an Ingenic JZ4760B : it crashes upon booting up any game. The ingenic Jz4760 is MIPS32r1 (non MSA) unlike the JZ4770 which is MIPS32r2 (it does have an FPU though). Disabling the SH2 DRC MIPS code makes it work again but it runs at like 7-8 FPS. Guess it's better than 3-4 FPS... I'm using an LDK/RS-97 for testing.
Unfortunately I can test mips only in qemu. I don't have real hardware for this. But in qemu it works fine (fun fact, the basic drc also works fine on a 25 years old sgi indigo 2 :-)).
I guess if it's not the offsets from above I have to organise a real hw...
Regards, --kub
—
You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4XIWFTPFXX63THYL5LQD6LKFA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4A2AIA#issuecomment-520200224, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4WD5JI33LINXZM722DQD6LKFANCNFSM4IH5H3EQ .
On Sun, 11 Aug 2019, 09:00 Kai-Uwe Bloem, [email protected] wrote:
Hi,
On Sun, 11 Aug 2019, 06:52 gameblabla, [email protected] wrote:
it doesn't compile with flto enabled :
pico/32x/memory.c:44:1: error: global register variable follows a function definition 44 | DRC_DECLARE_SR;
What platform did you compile for? Libretro or standalone?
After some more thinking I suspect it might be a compiler version issue. Which compiler version did you use? Which additional cflags?
I built an automatic mechanism to calculate the offsets in pico_int_offs.h. it's absolutely needed at least once since some offsets have changed. That may also explain why the mips drc doesn't work. What exactly did fail?
I get this error when trying when doing make : objcopy: Unable to recognise the format of the input file `/tmp/getoffs.o'
It does that even when i try to do it manually with CC set to mipsel-linux-gcc.
After some more thinking I suspect it might be a compiler version issue. Which compiler version did you use? Which additional cflags?
I'm using my own toolchain which uses GCC 9.1 (with some patches applied to it), no additional CFLAGS.
I also tried the gcw0 toolchain (which is GCC 4.8, got it here : http://www.gcw-zero.com/develop) and i still get the same error.
EDIT Turns out that it doesn't work because it uses the host objcopy, not the target's. I did a quick hack fix by using mipsel-linux-objcopy instead and it compiled just fine. Obviously this will need a proper fix for cross compiling like i do.
But guess what ? 32X games still don't work properly. Same way as before. (Picodrive crashes before even booting up the game)
On Sun, 11 Aug 2019, 16:03 gameblabla, [email protected] wrote:
I get this error when trying when doing make : objcopy: Unable to recognise the format of the input file `/tmp/getoffs.o'
It does that even when i try to do it manually with CC set to mipsel-linux-gcc.
Hmm. Sounds like the binutils don't support foreign ELF formats. In case you haven't done so already, could you please try to install multiarch binutils?
In any case, I'll think up another solution for this.
I'm using my own toolchain which uses GCC 9.1 (with some patches applied to it), no additional CFLAGS. I also tried the gcw0 toolchain (which is GCC 4.8, got it here : http://www.gcw-zero.com/develop) and i still get the same error.
I reckon it can't be gcc. It works fine with the Ubuntu supplied gcc for x86 and aarch64. I strongly suspect binutils.
Hmm. Sounds like the binutils don't support foreign ELF formats. In case you haven't done so already, could you please try to install multiarch binutils?
See my edit to my post. I managed to make it work because it wasn't using the target's objcopy but the host one. So if you try to cross compile it, it won't work. I had to edit the script so that it uses mipsel-linux-objcopy instead of objcopy but there should be a better fix than that obv.
I've made some changes to fix this -flto problem. Please pull and check if this solves your problem. The binutils problem isn't solved yet. I'm still searching for a general solution.
There is a tentative fix for missing multiarch binutils.
So mkoffsets.sh works good now, except that it now seemingly crashes on my host's GCC 9.1 compiler. But i heard that version had multiple issues with compiling stuff like PCSX2 anyway so switching to clang fixed it.
However, it is still not working. So i grabbed my Qemu GCW0 image and recompiled it with the GCW0 toolchain and it crashes in the exact same way in the QEMU vm as it does on my LDK. I wanted to debug it, however...
opendingux:/media/QEMU VVFAT # gdb PicoDrive
GNU gdb (GDB) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "mipsel-gcw0-linux-uclibc".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from PicoDrive...done.
(gdb) run "rom.32x"
Starting program: /media/QEMU VVFAT/PicoDrive "rom.32x"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/libthread_db.so.1".
plat_sdl: using 320x240 as fullscreen resolution
plat_sdl: overlay: fmt 59565955, planes: 1, pitch: 640, hw: 0
warning: video overlay is not hardware accelerated, not going to use it.
input: new device #0 "sdl:keys"
input: async-only devices detected..
# drv probed binds name
0 0 y y sdl:keys
using sdl audio output driver
platform/libpicofe/readpng.c: failed to open: /media/QEMU VVFAT/skin/font.png
platform/libpicofe/readpng.c: failed to open: /media/QEMU VVFAT/skin/selector.png
emu_ReloadRom(rom.32x)
00000:000: couldn't open carthw.cfg!
00000:000: sram: 200000 - 203fff; eeprom: 0
starting audio: 44100 len: 735 stereo: 1, pal: 0
00003:134: 32X startup
00003:134: drc_cmn_init: 0x676000, 4194304 bytes: 0
ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
Program received signal SIGSEGV, Segmentation fault.
0x00000000 in ?? ()
(gdb) bt
#0 0x00000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
So unfortunately that's kind of a bummer. It does the same thing with my buildroot too so we can rule out a regression too.
If you want to give it a try yourself, grab the qemu image here : http://www.gcw-zero.com/files/gcw0-qemu.zip
For convenience, i also add the -hdc fat:rw:./myfolderwithstuff switch to run-gcw0.sh. Put your build of Picodrive with a 32x game inside of that folder, run run-gcw0.sh in a terminal. Select the terminal app in the GUI. (Controls are : LCTRL for A, TAB/BACKSPACE for L/R, and so on) Go to your terminal where you ran ./run-gcw0.sh and type in
cd /media/QEMU VVFAT
Then you can use GDB and run it on your Picodrive build. It will most likely crash like my builds...
I must mention that Genesis games still work fine.
That's really strange. I used that qemu image for testing when writing the mips backend.
I've just built a fresh github checkout with "ln -sf config.gcw0 config.mak && make clean opk". I copied the opk into the gcw0_data image, and I can start it and load 32x roms:
< Welcome to OpenDingux ! >
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
opendingux:/media/data/local/home # mount /media/data/apps/PicoDrive.opk /mnt/ opendingux:/media/data/local/home # cd /mnt opendingux:/mnt # ./PicoDrive plat_sdl: using 320x240 as fullscreen resolution plat_sdl: overlay: fmt 59565955, planes: 1, pitch: 640, hw: 0 warning: video overlay is not hardware accelerated, not going to use it. input: new device #0 "sdl:keys" input: async-only devices detected..
drv probed binds name
0 0 y y sdl:keys config_readsect: unhandled val for "Video output mode": "SDL Window" config_readsect: loaded from /usr/local/home/.picodrive/config2.cfg using sdl audio output driver platform/libpicofe/readpng.c: unexpected font image size 256x320, needed 128x160 platform/libpicofe/readpng.c: failed to open: /mnt/skin/selector.png found skin.txt selected file: /media/data/roms/rom1.32x emu_ReloadRom(/media/data/roms/rom1.32x) config_readsect: loaded from /usr/local/home/.picodrive/config2.cfg config_readsect: loaded from /usr/local/home/.picodrive/config2.cfg 00000:000: couldn't open carthw.cfg! 00000:000: sram: 200000 - 203fff; eeprom: 0 starting audio: 44100 len: 735 stereo: 1, pal: 0 ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred 00003:134: 32X startup 00003:134: drc_cmn_init: 0x636000, 4194304 bytes: 0 ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred (...more of the same, while the rom is running...)
I'm at a loss here. There must be a difference in our setups, but where is it?
Could you possibly get the output of "info reg" and see if the link register contains something useful? Is it maybe a path problem? The paths in the config.* are set so that the toolchains live in $HOME/opt. (And yes, that must be changed somehow. I just don't have an idea how to do this independantly from platform/compiler) What is the output if you compile it with drc_debug set to 15? (You may need a patch to libpicofe for this:) diff --git a/linux/host_dasm.c b/linux/host_dasm.c index 66a83ea..eba39ac 100644 --- a/linux/host_dasm.c +++ b/linux/host_dasm.c @@ -22,11 +22,21 @@ extern char **g_argv;
static struct disassemble_info di;
-#ifdef arm +#if defined arm #define print_insn_func print_insn_little_arm #define BFD_ARCH bfd_arch_arm #define BFD_MACH bfd_mach_arm_unknown #define DASM_OPTS "reg-names-std" +#elif defined aarch64 +#define print_insn_func print_insn_aarch64 +#define BFD_ARCH bfd_arch_aarch64 +#define BFD_MACH bfd_mach_aarch64 +#define DASM_OPTS NULL +#elif defined mips +#define print_insn_func print_insn_little_mips +#define BFD_ARCH bfd_arch_mips +#define BFD_MACH bfd_mach_mipsisa32 +#define DASM_OPTS NULL #elif defined(x86_64) || defined(i386) #define print_insn_func print_insn_i386_intel #define BFD_ARCH bfd_arch_i386
On Fri, Aug 16, 2019 at 9:03 PM gameblabla [email protected] wrote:
So now mkoffsets.sh works good now, except that it now seemingly crashes on my host's GCC 9.1 compiler. But i heard that version had multiple issues with compiling stuff like PCSX2 anyway so switching to clang fixed it.
However, it is still not working. So i grabbed my Qemu GCW0 image and recompiled it with the GCW0 toolchain and it crashes in the exact same way in the QEMU vm as it does on my LDK. I wanted to debug it, however...
opendingux:/media/QEMU VVFAT # gdb PicoDrive GNU gdb (GDB) 7.7.1 Copyright (C) 2014 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "mipsel-gcw0-linux-uclibc". Type "show configuration" for configuration details. For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/. Find the GDB manual and other documentation resources online at: http://www.gnu.org/software/gdb/documentation/. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from PicoDrive...done. (gdb) run "rom.32x" Starting program: /media/QEMU VVFAT/PicoDrive "rom.32x" [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/libthread_db.so.1". plat_sdl: using 320x240 as fullscreen resolution plat_sdl: overlay: fmt 59565955, planes: 1, pitch: 640, hw: 0 warning: video overlay is not hardware accelerated, not going to use it. input: new device #0 "sdl:keys" input: async-only devices detected..
drv probed binds name
0 0 y y sdl:keys using sdl audio output driver platform/libpicofe/readpng.c: failed to open: /media/QEMU VVFAT/skin/font.png platform/libpicofe/readpng.c: failed to open: /media/QEMU VVFAT/skin/selector.png emu_ReloadRom(rom.32x) 00000:000: couldn't open carthw.cfg! 00000:000: sram: 200000 - 203fff; eeprom: 0 starting audio: 44100 len: 735 stereo: 1, pal: 0 00003:134: 32X startup 00003:134: drc_cmn_init: 0x676000, 4194304 bytes: 0 ALSA lib pcm.c:7843:(snd_pcm_recover) underrun occurred
Program received signal SIGSEGV, Segmentation fault. 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)
So unfortunately that's kind of a bummer. It does the same thing with my buildroot too so we can rule out a regression too.
If you want to give it a try yourself, grab the qemu image here : http://www.gcw-zero.com/files/gcw0-qemu.zip
For convenience, i also add the -hdc fat:rw:./myfolderwithstuff switch to run-gcw0.sh. Put your build of Picodrive with a 32x game inside of that folder, run run-gcw0.sh in a terminal. Select the terminal app in the GUI. Go to your terminal where you ran ./run-gcw0.sh and type in
cd /media/QEMU VVFAT
Then you can use GDB and run it on your Picodrive build. It will most likely crash like my builds...
I must mention that Genesis games still work fine.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4W6LQKVTKUMWKIV26TQE32XFA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4POFPQ#issuecomment-522117822, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4RSOTON7G6AGND3D33QE32XFANCNFSM4IH5H3EQ .
Well i managed to make the GCW0 version work on QEMU as you described but still couldn't make it work on my LDK in a similar way (except by extracting the executable out of the OPK). I'm really not sure what's going on, either a toolchain issue or something else.
I'll try some other workaround before i give up on that because i have no idea.
How exactly are you building your version? Can I replicate that to check if I can reproduce the problem?
On Sat, 17 Aug 2019, 17:43 gameblabla, [email protected] wrote:
Well i managed to make the GCW0 version work on QEMU as you described but still couldn't make it work on my LDK in a similar way (except by extracting the executable out of the OPK). I'm really not sure what's going on, either a toolchain issue or something else.
I'll try some other workaround before i give up on that because i have no idea.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4UJCBMG6KHB4CX7VZDQFAMCPA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QOAWY#issuecomment-522248283, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4R6PRJ6ITU2IORAFADQFAMCPANCNFSM4IH5H3EQ .
Another idea: could you please look into pico/pico_int_offs.h and check if the computed offsets look ok?
On Sat, 17 Aug 2019, 17:52 Kai-Uwe Bloem, [email protected] wrote:
How exactly are you building your version? Can I replicate that to check if I can reproduce the problem?
On Sat, 17 Aug 2019, 17:43 gameblabla, [email protected] wrote:
Well i managed to make the GCW0 version work on QEMU as you described but still couldn't make it work on my LDK in a similar way (except by extracting the executable out of the OPK). I'm really not sure what's going on, either a toolchain issue or something else.
I'll try some other workaround before i give up on that because i have no idea.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4UJCBMG6KHB4CX7VZDQFAMCPA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QOAWY#issuecomment-522248283, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4R6PRJ6ITU2IORAFADQFAMCPANCNFSM4IH5H3EQ .
/* autogenerated by mkoffset.sh, do not edit */
/* target endianess: le, compiled with: /opt/rs97-toolchain-PIE/usr/bin/mipsel-linux-gcc -Wall -ggdb -ffunction-sections -fdata-sections -I. -O2 -finline-functions -DNDEBUG -falign-functions=2 -I/opt/rs97-toolchain-PIE/usr/mipsel-rs97-linux-uclibc/sysroot/usr/include/ -I/opt/rs97-toolchain-PIE/usr/mipsel-rs97-linux-uclibc/sysroot/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT -Wno-unused-result -fno-stack-protector -march=mips32 -mtune=mips32 -mhard-float -DEMU_F68K -D_USE_CZ80 -DDRC_SH2 */
#define OFS_Pico_video_reg 0x0000
#define OFS_Pico_m_rotate 0x0040
#define OFS_Pico_m_z80Run 0x0041
#define OFS_Pico_m_dirtyPal 0x0046
#define OFS_Pico_m_hardware 0x0047
#define OFS_Pico_m_z80_reset 0x004f
#define OFS_Pico_m_sram_reg 0x0049
#define OFS_Pico_sv 0x0090
#define OFS_Pico_sv_data 0x0090
#define OFS_Pico_sv_start 0x0098
#define OFS_Pico_sv_end 0x009c
#define OFS_Pico_sv_flags 0x00a0
#define OFS_Pico_rom 0x0570
#define OFS_Pico_romsize 0x0578
#define OFS_Pico_est 0x00c0
#define OFS_EST_DrawScanline 0x0000
#define OFS_EST_rendstatus 0x0004
#define OFS_EST_DrawLineDest 0x0008
#define OFS_EST_HighCol 0x0010
#define OFS_EST_HighPreSpr 0x0018
#define OFS_EST_Pico 0x0020
#define OFS_EST_PicoMem_vram 0x0028
#define OFS_EST_PicoMem_cram 0x0030
#define OFS_EST_PicoOpt 0x0038
#define OFS_EST_Draw2FB 0x0040
#define OFS_EST_HighPal 0x0048
#define OFS_PMEM_vram 0x10000
#define OFS_PMEM_vsram 0x22100
#define OFS_PMEM32x_pal_native 0x90e00
#define OFS_SH2_is_slave 0x0a18
#define OFS_SH2_p_bios 0x0098
#define OFS_SH2_p_da 0x00a0
#define OFS_SH2_p_sdram 0x00a8
#define OFS_SH2_p_rom 0x00b0
#define OFS_SH2_p_dram 0x00b8
#define OFS_SH2_p_drcblk_da 0x00c0
#define OFS_SH2_p_drcblk_ram 0x00c8
The offsets look like these... No idea if this is correct or not.
I'm using my own toolchain here : https://github.com/rs-97-cfw/buildroot
It's fully static with forced no pic and mno-abicalls. Reverted those changes and rebuilt Picodrive (i know some stuff that wouldn't like those) but still crashes on 32X games. I'll try to recompile Picodrive using the toolchain that is used for the rootfs and see if that fixes the issue (given that some issue could arise from either)
Also, i tried using the config.gcw0 file and modifying it for my toolchain as well as using CROSS_COMPILE=mipsel-linux- ./configure --platform=opendingux. Compiles but still crashes on 32X games.
Perhaps it works with the GCW0 toolchain due to the older GCC ? No idea.
The offsets look OK... though as an afterthought I think they are only used in asm parts... and those mainly (only?) exist for arm.
Anyway... what's your make command? Just to save me some hours: Is there a binary release of your toolchain/sysroot I can readily install in ubuntu 18?
On Sat, Aug 17, 2019 at 6:18 PM gameblabla [email protected] wrote:
/* autogenerated by mkoffset.sh, do not edit / / target endianess: le, compiled with: /opt/rs97-toolchain-PIE/usr/bin/mipsel-linux-gcc -Wall -ggdb -ffunction-sections -fdata-sections -I. -O2 -finline-functions -DNDEBUG -falign-functions=2 -I/opt/rs97-toolchain-PIE/usr/mipsel-rs97-linux-uclibc/sysroot/usr/include/ -I/opt/rs97-toolchain-PIE/usr/mipsel-rs97-linux-uclibc/sysroot/usr/include/SDL -D_GNU_SOURCE=1 -D_REENTRANT -Wno-unused-result -fno-stack-protector -march=mips32 -mtune=mips32 -mhard-float -DEMU_F68K -D_USE_CZ80 -DDRC_SH2 */ #define OFS_Pico_video_reg 0x0000 #define OFS_Pico_m_rotate 0x0040 #define OFS_Pico_m_z80Run 0x0041 #define OFS_Pico_m_dirtyPal 0x0046 #define OFS_Pico_m_hardware 0x0047 #define OFS_Pico_m_z80_reset 0x004f #define OFS_Pico_m_sram_reg 0x0049 #define OFS_Pico_sv 0x0090 #define OFS_Pico_sv_data 0x0090 #define OFS_Pico_sv_start 0x0098 #define OFS_Pico_sv_end 0x009c #define OFS_Pico_sv_flags 0x00a0 #define OFS_Pico_rom 0x0570 #define OFS_Pico_romsize 0x0578 #define OFS_Pico_est 0x00c0 #define OFS_EST_DrawScanline 0x0000 #define OFS_EST_rendstatus 0x0004 #define OFS_EST_DrawLineDest 0x0008 #define OFS_EST_HighCol 0x0010 #define OFS_EST_HighPreSpr 0x0018 #define OFS_EST_Pico 0x0020 #define OFS_EST_PicoMem_vram 0x0028 #define OFS_EST_PicoMem_cram 0x0030 #define OFS_EST_PicoOpt 0x0038 #define OFS_EST_Draw2FB 0x0040 #define OFS_EST_HighPal 0x0048 #define OFS_PMEM_vram 0x10000 #define OFS_PMEM_vsram 0x22100 #define OFS_PMEM32x_pal_native 0x90e00 #define OFS_SH2_is_slave 0x0a18 #define OFS_SH2_p_bios 0x0098 #define OFS_SH2_p_da 0x00a0 #define OFS_SH2_p_sdram 0x00a8 #define OFS_SH2_p_rom 0x00b0 #define OFS_SH2_p_dram 0x00b8 #define OFS_SH2_p_drcblk_da 0x00c0 #define OFS_SH2_p_drcblk_ram 0x00c8
The offsets look like these... No idea if this is correct or not.
I'm using my own toolchain here : https://github.com/rs-97-cfw/buildroot
It's fully static with forced no pic and mno-abicalls. Reverted those changes and rebuilt Picodrive (i know some stuff that wouldn't like those) but still crashes on 32X games. I'll try to recompile Picodrive using the toolchain that is used for the rootfs and see if that fixes the issue (given that some stuff could arise from either)
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4UV35QRCRUI4B55OM3QFAQE3A5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QOT6Y#issuecomment-522250747, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4QQI2BEBIW3EPP3643QFAQE3ANCNFSM4IH5H3EQ .
Well nvm, looks like to be an issue with my toolchain, it works with another one. (which i'll upload tomorrow, it uses GCC 7.3. I wonder if GCC 9.1 was the issue ?) Runs at about 18/19 FPS on After Burner 32X though versus 4/5 FPS but i'll take it.
I guess this can be merged now.
OK... though I would really be interested in the root cause of this. Thank you for your patience.
After Burner is a tough customer. I haven't had time to delve deeply into this, but it apparently has a lot of sync switching between the SH2 CPUs which is rather expensive. It might also have some high profile irq. Both break poll detection (that's what apparently slows down some other games; optimization idea: push/pop poll detection state with irq/rte). It's one of the slower games, at about 25-35 fps on a caanoo with a frameskip of 2. Try something like Tempo, that should work much better.
I think there are some opportunities to optimize the mips backend:
- mips32r2 stuff may be used if appropriate (I've already documented where I think this to be viable).
- The flag register emulation stuff can use some consideration (it just does too much in a lot of cases).
- Jumping far often produces an empty delay slot (consider using something like a PLT or a literal area for patchable jumps in each block?). Anyway. Measurement over about a half million compiled SH2 insns show ~3.3 mips insns per SH2 insn. ARM32 code generates the fewest insns at ~3 insns for the same test, so it's not that bad, and considering the numbers I think above optimizations may give you maybe 5-10% more speed. Much work for a rather small effect.
On Sat, Aug 17, 2019 at 10:36 PM gameblabla [email protected] wrote:
Well nvm, looks like to be an issue with my toolchain, it works with another one. (which i'll upload tomorrow, it uses GCC 7.3. I wonder if GCC 9.1 was the issue ?) Runs at about 18/19 FPS on After Burner though versus 4/5 FPS but i'll take it.
I guess this can be merged now.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4R2SCW36B7ETX66Z3DQFBONZA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QTBJA#issuecomment-522268836, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4RC2EKMDEZCYXD7ZODQFBONZANCNFSM4IH5H3EQ .
OK... though I would really be interested in the root cause of this. Thank you for your patience.
I downgraded to GCC 7.4 and Binutils 2.31.1 and it fixed the crashing issue i had with Picodrive. (still fully statically linked, as i suspected that was not the issue) Building with GCC 9.1 would repeatedly make Picodrive crash. I will try to debug it with Valgrind to see if it could be an issue with your code but at least i have a (better) workaround for now.
I think there are some opportunities to optimize the mips backend:
- mips32r2 stuff may be used if appropriate (I've already documented where I think this to be viable).
I may suggest you the MXU instruction set as well ? It's an instruction set for the VPU coprocessor found on Ingenic socs like the JZ4760B (which the LDK/RS-97/RG-300/PAP K3/Gameta have) as well as its sucessors. The Dingoo A320 also supports the MXU but only revision 1. JZ4755 and above support MXU revision 2.
Senquack made a header that allows you to use it since linkers do not support it
(Header to include in your project for MXU set) https://github.com/senquack/mxu1_as_macros (A test example below for checking the MXU ) https://github.com/senquack/mxu_pcercuei_test
He had implemented in MXU/MIPS32 assembly a GTE implementation for PCSX4ALL (which will be out by the end of this year hopefully) and he told me he had a performance improvement of around 10~20%. (he said that it could be above that)
But given that it's seldom documented, well no urge i suppose lol...
As for MIPS32r2 stuff, well outside of the GCW0 and the upcoming RG-350 it is fairly uncommon...
Not sure about your other suggestions though but they sound good.
Also Tempo without frame-skipping is like 32-60 FPS.
Hmm, I can see the potential for a GPU, but it's probably not worth using a vector extension in the DRC, since it strictly operates on scalar values. Regarding the speed difference between caanoo and LDK, I think this is due to the heavy ARM assembler optimisation, which isn't available for any other target. I might say something well known, but using google perftools on the target helped me a lot. I can explain how I did this on a caanoo if necessary.
On Sun, Aug 18, 2019 at 1:43 AM gameblabla [email protected] wrote:
OK... though I would really be interested in the root cause of this. Thank you for your patience.
I downgraded to GCC 7.4 and Binutils 2.31.1 and it fixed the crashing issue i had with Picodrive. Building with GCC 9.1 would repeatedly make Picodrive crash. I will try to debug it with Valgrind to see if it could be an issue with your code but at least i have a (better) workaround for now.
I think there are some opportunities to optimize the mips backend:
- mips32r2 stuff may be used if appropriate (I've already documented where I think this to be viable).
I may suggest you the MXU instruction set as well ? It's an instruction set for the VPU coprocessor found on Ingenic socs like the JZ4760B (which the LDK/RS-97/RG-300/PAP K3/Gameta have) as well as its sucessors. The Dingoo A320 also supports the MXU but only revision 1. JZ4755 and above support MXU revision 2.
Senquack made a header that allows you to use it since linkers do not support it
(Header to include in your project for MXU set) https://github.com/senquack/mxu1_as_macros (A test example below for checking the MXU ) https://github.com/senquack/mxu_pcercuei_test
He had implemented in MXU/MIPS32 assembly a GTE implementation for PCSX4ALL (which will be out by the end of this year hopefully) and he told me he had a performance improvement of around 20%. (he said that it could be above that)
As for MIPS32r2 stuff, well outside of the GCW0 and the upcoming RG-350 it is fairly uncommon...
Not sure about your other suggestions though but they sound good.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4SEPICKNQPYCYEW573QFCEL3A5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4QVNTY#issuecomment-522278607, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4WJCG4S7WQ6F323AGTQFCEL3ANCNFSM4IH5H3EQ .
I may accidently have found one culprit for the low speed you observed. It's the rgb565_to_uyvy function in platform/common/plat_sdl.c. It has a pixel loop with about 50 mips insns which is executed for every pixel displayed - at 60Hz frame rate that makes 32024060 pixels per second...and a whopping 240000000 insns. With jz4760b@740Mhz that's about a third of the available performance.
I have a patch reducing this to just above 20 insns, at the expense of having a huge precalculated array with the yuv values. However, I'm not so sure this will really be faster since it trashes the data cache for sure. Would you be willing to check it out for me on real hardware before I commit it?
Yes, the YUV related code is very slow, which is why our RS-97 fork of it avoids it and just directly draws to the screen in RGB565 mode. I will be able to test either way.
Very interesting thing guys!!! I'm using Picodrive as a core in the RetroArch for the Playstation 2. @irixxxx the PlayStation 2 has a MIPS64 processor, you mentioned something about the improvements in MIPS, will it affect PlayStation 2 as well?. Finally, you were saying that you don't have a way to test the MIPS improvements, if the Playstation 2 is valid, I can help you with the process.
Additionally, I have a fork of Picodrive which add PS2 platform (is outdated I would need to rebase).
I would like to check how fast 32x is now in the PS2 xDD
As long as the PS2 processor supports the Mips32r1 ISA (which I think it should), it should work fine. Just don't forget to enable the DRC in the makefile.
Don't expect too much, though. I reckon it might be on par with a jz7440. @gameblabla has tested it on a jz7460 and got a so-so result, with below 20fps in afterburner, a rather high profile game wrt cpu usage. Other games should work noticably better.
The 8 bit output mode is working at least on gp2x and generic. However, iirc 32x always has rgb565 output since that is its native format, so no cigar here.
On Fri, 23 Aug 2019, 15:47 Francisco Javier Trujillo Mata, < [email protected]> wrote:
Very interesting thing guys!!! I'm using Picodrive as a core in the RetroArch for the Playstation 2. @irixxxx https://github.com/irixxxx the PlayStation 2 has a MIPS64 processor, you mentioned something about the improvements in MIPS, will it affect PlayStation 2 as well?. Finally, you were saying that you don't have a way to test the MIPS improvements, if the Playstation 2 is valid, I can help you with the process.
Additionally, I have a fork of Picodrive which add PS2 platform (is outdated I would need to rebase).
I would like to check how fast 32x is now in the PS2 xDD
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4WFRRHVD5BRFOUNFQ3QF7S5PA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5AIH4Y#issuecomment-524321779, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4RCMWABZSVAFLZBUIDQF7S5PANCNFSM4IH5H3EQ .
Yes, the YUV related code is very slow, which is why our RS-97 fork of it avoids it and just directly draws to the screen in RGB565 mode. I will be able to test either way.
I've commited some changes. It involves 3 functions central to frame creation, FinalizeLine555 (manual loop unrolling), rgb565_to_uyvy (more aggressive precalculation in a large array), and the do_line_pp, do_line_dc macros (splitting loops to simplify tests). On x86 I can see an effect of some 5-10% higher frame rates depending on the rom (after turning off the frame limiter). The 1st 2 functions make for the larger part of this, the 3rd change is in comparison less effective. I also see this effect in the gcw0 "simulator" (after all, it runs on the same x86 cpu). I would much appreciate it if you could try this out on a mips hw, comparing the results with and without my last commit.
Just saying but it was never using the YUV related code in the first place... lol So it really wasn't related. Should i still give it a try ?
Right... it does so only on platforms using YUV in the first place. However, FWIW, you might try it anyway - it optimizes some drawing functions, and the last commit brings some low hanging fruits for a small code size reduction. It might still help a little bit. It's an extra frame on a caanoo, give or take.
I also played around a bit with -flto on arm with gcc 4.7 (can't use anything newer), but I cannot find any constellation where it produces faster code on the caanoo - it's more like some % slower. Does it produce faster code for you on the mips platforms?
On Fri, Aug 30, 2019 at 8:43 PM gameblabla [email protected] wrote:
Just saying but it was never using the YUV related code in the first place... lol So it really wasn't related. Should i still give it a try ?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/notaz/picodrive/pull/100?email_source=notifications&email_token=AHR2L4S2QPHT5TWRCUMWDUTQHFS6BA5CNFSM4IH5H3E2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5SOXNA#issuecomment-526707636, or mute the thread https://github.com/notifications/unsubscribe-auth/AHR2L4UZIJ6TB7QQUDWUW33QHFS6BANCNFSM4IH5H3EQ .
If you could spare the time, I would appreciate a run of the latest version on real mips hardware. A performance report would also be very nice, just to see if the changes aren't only good for ARMv4 in a caanoo.
I'll give it a try on my Retrostone with my CFW (ARMv7+NEON of course). I did find an issue though : trying to compile with -fprofile-generate will fail when trying to run mkoffsets.sh and it will output text like "undefined reference to free" and etc... Removing -fprofile-generate allows it to compile.
Would it be possible to have some way to compile Picodrive with PGO without it affecting mkoffsets ? PGO can result in a speedup of 10% on a low end device like the RS-97/LDK.
EDIT: This is strange. Sometimes it will freeze picodrive for no reason at random when playing After Burner. Again, this is on my Retrostone with the ARM32 backend. This did not happen on the older version (without your commits)