psx-sdk-rs
psx-sdk-rs copied to clipboard
LLVM can't target the Playstation CPU (MIPS-I)
I've been struggling for the past few days with extremely erratic behaviour from my playstation code. Simple code would work correctly but when things get more complicated (in particular function calls and BIOS calls) the program would appear to behave semi-randomly (bus errors, freezes, functions get executed several times in a row when they shouldn't etc...)
After some intense hair pulling I figured out the cause of the problem: the MIPS-I architecture has load delay slots. It means that when the CPU loads a value from memory (through a lw
instruction for instance) the next instruction will run before the value is put in the target register. That means that you have to wait one cycle (using a nop
or some independant instruction) before you can use the value.
The problem is that LLVM doesn't support MIPS-I at the moment and MIPS-II doesn't have this constraint. So for instance when the compiler generates the function epilogue that pops the value of ra
(the return address) from the stack and jumps to it we end up with something like:
lw ra,20(sp)
jr ra
Unfortunately this code doesn't work on the Playstation's R2000 CPU: the lw
doesn't have the time to finish before the jr
is called (the jr
is in the load delay slot), so the jump takes whatever value was in ra
before the lw
instruction. And that explains the very erratic behaviour.
Unfortunately as far as I can tell LLVM doesn't support MIPS-I at all, if I attempt to put "cpu": "mips1"
in target.json
I get this error:
LLVM ERROR: Code generation for MIPS-I is not implemented
Unfortunately there's no workaround. The solution would be to add MIPS-I support to LLVM but that's of course a pretty ambitious project and my plate is full at the moment.
As long as this issue is not fixed there's no point working on this SDK, nothing but the most trivial code will run as intended on the console.
If it's mostly the same, why not copy whatever MIPS-II does and change that behavior? Would be much easier than writing it all from scratch.
I'm not at all familiar with LLVM so I didn't pursue this further. I assume it can't be too complicated but I really have no idea.
There is a patch available at https://github.com/impiaaa/llvm-project/commit/f2ecffebaa3200703cbf2c60a1d642a5c5254b01 but according to the developer it could use additional testing.
That's great news although I must say that I wouldn't even know where to start to generate a rustc using a custom LLVM so I'm probably going to wait until it (hopefully) ends up in a stable release.
A few weeks ago I saw that there was a GCC backend for rustc in the works: https://github.com/sapir/gcc-rust
That might be an other way to get MIPSI support in Rust.
I wanted to play around with the Zig programming language (see https://github.com/XaviDCR92/psx-zig), whose compiler is also based on LLVM, so merged the patch into the release/10.x branch of https://github.com/XaviDCR92/llvm-project . It patch seems to work correctly, so I encourage you to resume your work on psx-sdk-rs. Well-known issues with Zig's packed structs are preventing further development, but I think should be now fine with Rust.
I recently got rustc working with LLVM 11 patched to support load delay slots if there's still interest in this. My fork has instructions in the README along with @impiaaa's LLVM patch (with a tiny fix) and a patch to build target.json
and psx.ld
into rustc. I checked that both rustc and llc by itself handle load delay slots correctly, though the tests weren't very extensive and I've only run programs on ePSXe.
I think it would've been convenient to use the JSON target, but it seems building the target into rustc is required since libcore won't compile unless the Atomic* API is disabled (by setting max_atomic_width = Some(0)
in the target spec). This is because MIPS I doesn't have a sync
instruction, unlike MIPS II. Either way rustc has to be compiled using the patched LLVM so it wouldn't have saved much time in building the compiler anyway.
Also I threw in a stripped-down copy of @simias's Makefile for the linker and merged libbios into a submodule of libpsx to make the SDK more user-friendly.
@ayrtonm , the delay slot patch seems to work well for me too (I also had to apply the tiny fix to avoid a compile-time error).
However, I am having a curious issue that's making my Zig/C application crash by jumping into an invalid address. The root cause of this issue is $ra
being assigned back to a value in $sp, 52
, but $sp
holds an invalid value and thus undefined behavior occurs.
After some investigation, it looks like bad generated code from the compiler side (I can't tell so far whether it's caused by either the LLVM backend or the Zig frontend). Consider the following lines of C code from PSXSDK:
for(a = 0; a < l; a+=2)
GPU_DATA_PORT = image[a]|(image[a+1]<<16);
Among other low-level instructions, the following branch instruction is generated:
80012eb8: 14200017 bnez at,80012f18 <LoadImage+0x1ac>
And this is what address 80012f18
contains (added some lines for context):
80012f00: 03c0e825 move sp,s8
80012f04: 8fbe0000 lw s8,0(sp)
80012f08: 8fbf0004 lw ra,4(sp)
80012f0c: 00000000 nop
80012f10: 03e00008 jr ra
80012f14: 27bd0008 addiu sp,sp,8
while(!(GPU_CONTROL_PORT & (1<<0x1c)));
80012f18: 0000000d break
80012f1c <GsSetDrawEnv>:
{
80012f1c: 27bdfff8 addiu sp,sp,-8
Therefore, 80012f18
is, much likely, not the address the program should be jumping to. Instead of leaving $sp
to its original value and jumping to $ra
, break
is executed and GsSetDrawEnv
is accidentally executed, causing undefined behaviour.
The compiler patch has added 18 nop
instructions in LoadImage
. I am not sure whether it's related or not, but 80012f18h - (18d * 4) = 80012ed0h
, which casually belongs to the start of the loop.
for(a = 0; a < l; a+=2)
80012ed0: 24e80002 addiu t0,a3,2
Would it be possible that the LLVM backend is miscalculating destination addresses in branch instructions such as bnez
since the patch is (apparently silently) adding several nop
between instruction? I have not investigated other functions yet, but I must admit the application has already had erratic behavior several times now and this might explain it.
I haven't touched this code in forever but I'm glad to see that there's progress!
@XaviDCR92 Have you tried to see the difference the various optimization levels make? Regardless of the miscalculated branch addresses it would be odd to end up with a bunch of useless NOPs in optimized code. What I'm getting at here is that a bunch of leftover NOPs with optimizations enabled seems like it would be a good sign that something messes up in the compiler.
@simias Zig provides a --release-small
flag roughly equivalent to gcc
's -Os
, but my application would not work at all with these optimizations enabled. I still have to find a reason for this.
OTOH, these apparently useless nop
instructions must be there because of the load delay slot problem on MIPS I that you reported on your first message. There is no workaround other than replacing them with useful instructions that do not depend on their previous instruction, but that's not a serious issue.
Surprisingly though, I have found a possible reason why the code above was inadvertently jumping to a break
instruction - there is a buffer overflow caused by this loop when called from this function, as pal
is only 2 elements long and LoadImage
is attempting to access pal[0 .. 15]
.
gcc
would not issue any diagnostic message on this, but zig
instead adds runtime bound checks when building in debug mode, thus executing the break
instruction once an out-of-bound access occurs. As a quick workaround, I expanded pal
from 2 to 16 elements and now GsLoadFont
executes successfully.
More worryingly, I have also found other places where zig
generates wrong code with break
instructions here and there e.g.: GsSetDrawEnv:
80012f1c <GsSetDrawEnv>:
{
80012f1c: 27bdfff8 addiu sp,sp,-8
80012f20: afbf0004 sw ra,4(sp)
80012f24: afbe0000 sw s8,0(sp)
80012f28: 03a0f025 move s8,sp
draw_mode_packet = (0xe1<<24)|(drawenv->draw_on_display>=1)<<10|
80012f2c: 0000000d break
80012f30 <gpu_data_ctrl>:
{
80012f30: 27bdfff8 addiu sp,sp,-8
80012f34: afbf0004 sw ra,4(sp)
Compared to the implementation by mipsel-unknown-elf-gcc
:
80034bc0 <GsSetDrawEnv>:
{
80034bc0: 27bdffe8 addiu sp,sp,-24
80034bc4: afbf0014 sw ra,20(sp)
draw_mode_packet = (0xe1<<24)|(drawenv->draw_on_display>=1)<<10|
80034bc8: 90820001 lbu v0,1(a0)
80034bcc: 00000000 nop
80034bd0: 1440003a bnez v0,80034cbc <GsSetDrawEnv+0xfc>
80034bd4: 00802825 move a1,a0
80034bd8: 3c02e100 lui v0,0xe100
(drawenv->dither>=1)<<9;
80034bdc: 90a30000 lbu v1,0(a1)
gpu_data_ctrl(0xe3, (drawenv->y<<10)|drawenv->x);
80034be0: 84a40004 lh a0,4(a1)
(drawenv->dither>=1)<<9;
80034be4: 0003182b sltu v1,zero,v1
80034be8: 00031a40 sll v1,v1,0x9
draw_mode_packet = (0xe1<<24)|(drawenv->draw_on_display>=1)<<10|
80034bec: 00431025 or v0,v0,v1
80034bf0: 3c038008 lui v1,0x8008
80034bf4: ac625d78 sw v0,23928(v1)
GPU_CONTROL_PORT = 0x01000000;
80034bf8: 3c060100 lui a2,0x100
80034bfc: 3c031f80 lui v1,0x1f80
80034c00: ac661814 sw a2,6164(v1)
gpu_data_ctrl(0xe3, (drawenv->y<<10)|drawenv->x);
80034c04: 84a80002 lh t0,2(a1)
...
In the meantime, I'm improving my fork of PCSX-r so pcsxr
notifies whenever it executes a break
instruction when debugging with mipsel-unknown-elf-gdb
.
After playing around with rust on the playstation for a while, I haven't been able to reproduce @XaviDCR92's issue with out of place break
and my demos are behaving as expected on emulators.
I haven't used zig extensively, but I think the issue may be runtime checks or UB on shift overflow. For example, drawenv->draw_on_display
is a u8 and I'm not sure if it gets implicitly cast to bool after the >=
or if it just becomes a 0 or 1 as a u8. But either way, it then gets shifted more than 8 bits which might be a problem depending on how strict zig's type system is.
I also tried compiling some C with zig (0.6.0) myself and found that unsigned int z = 0xe1 << 24
caused me to hit an illegal instruction while unsigned int z = (unsigned int)0xe1 << 24
didn't.
int f(unsigned char x, unsigned char y) {
2014e0: 55 push %rbp
2014e1: 48 89 e5 mov %rsp,%rbp
unsigned int z = (0xe1 << 24);
2014e4: 0f 0b ud2
int f(unsigned char x, unsigned char y) {
201560: 55 push %rbp
201561: 48 89 e5 mov %rsp,%rbp
unsigned int z = ((unsigned int)0xe1 << 24);
return z;
201564: b8 00 00 00 e1 mov $0xe1000000,%eax
201569: 5d pop %rbp
20156a: c3 retq
20156b: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
It seems to me that zig uses the smallest type possible to represent integer literals, but I couldn't find any specific documentation on that. Granted that was all on x86 not MIPS, but I think the issue might be in that general area.
Not sure how relevant it is to you (does PSX have an FPU?), but floating point comparisons also have a delay slot https://github.com/rust-lang/rust/issues/91442#issuecomment-985186799
nah the PSX doesn't have a floating point unit. It supports fixed point but that's all custom coprocessor opcodes from inline assembly so llvm's not an issue there.
Seems initial support for load delay slots is now part of the latest LLVM release if I understand that commit correctly. Is this issue still relevant?
It's not in upstream rust yet, but I was going to cherry-pick that commit onto the rustc branch and open a PR to add the psx target.
Also, filling load delay slots in code generation is only one part of full MIPS-I support. Remaining tasks include:
- load delay slots in
.set reorder
-
sync
instruction - 64-bit coprocessor transfers
- Trap-on-condition instructions
Yeah those would be definitely be good to upstream for MIPS-I, but the psx target's still fairly usable without them. I think the only really critical issue is the sync
instruction since DMA is borderline unusable without it.
This target's now usable with just a target JSON on nightly, so building the rust compiler isn't necessary anymore. I updated the crate's instructions in case anyone wants to try it out.
I opened a PR adding a built-in target to rustc so hopefully the target JSON will soon be unnecessary.
That's amazing work!
Now I have to find a way to get my test setup working again...