TransferBoy icon indicating copy to clipboard operation
TransferBoy copied to clipboard

Performance

Open joeldipops opened this issue 7 years ago • 9 comments

Unless I can get performance under control, the whole project is bunk

joeldipops avatar Oct 25 '18 22:10 joeldipops

Ok, implementing the CPU as a Binary search was a little faster, but didn't really work. I now understand jump-tables a little better, and so will try to implement that.

joeldipops avatar Jan 14 '19 04:01 joeldipops

I changed from using libdragon's RDP functions to using the graphic_draw_box functions, and that made a noticeable improvement. Probably moreso in cen64 than the hardware. I think a better understanding of n64 rendering should get me most of the way there.

joeldipops avatar Feb 02 '19 13:02 joeldipops

I also got an improvement by drawing one of the colours once as a background colour. However this is highly dependent on whether the screen is predominantly a particular colour. I'll try determining which colour is in the majority and picking that one for more consistency, but it's not a long-term solution.

I ultimately think the issue is either libdragon doesn't let me take full control of hardware rendering, or I just don't know how. to do so.

joeldipops avatar Feb 04 '19 03:02 joeldipops

Realised I only need to redraw the pixels that actually changed from the last frame (or frame before that since there are two frame buffers?) so that might improve things.

joeldipops avatar Apr 09 '19 22:04 joeldipops

RSP microcode can be run with rsp.h -> read_ucode, load_ucode, run_code functions

joeldipops avatar Apr 16 '19 02:04 joeldipops

~~Perhaps the gb registers can be mapped directly to n64 registers with the register keyword. Not likely to make a difference/may make things worse but worth trying.~~

~~The union of GB registers is 64bits, so that's pretty nice.~~

Nope, register keyword doesn't work that way.

joeldipops avatar May 15 '19 02:05 joeldipops

Cool, fixing the PC register to general purpose register $20 gave a small but noticeable improvement. I'd like to do this with all registers, but my union idea didn't work...mips gcc can't fix a struct to a register for some reason even if it's the right size. Not a good idea to give /every/ GB register a 64 register, but I'll do it for A, and maybe HL, or see if I can use assembly to do the work the struct normally would.

Biggest road block here is how to do 2-player. Conditions to determine which register to use would be worse than just reading from RAM, but I could, if it came to it, have two separate code paths that only differ by registers. Yuck. Need to optimise in many other ways before attempting that stupidity.


Turns out that you /can/ keep multiple 16bit vals stored in a single 64bit register, but the need to & or | or >> or whatever it for it to work means this is s sadly slower than just reading from memory. I was under the impression that mips could reference part registers (Load Upper Immediate) so there wouldn't be any need to & etc, so it's possible the compiler just doesn't understand what I'm trying to do and asm could fix it, but I am getting doubtful.

I think I will still try to put A on a register - this will mean slower push AF and pop AF instructions, but everything else should be snappier. Will look in to the union stuff after that, and hopefully after I finish Tpakio


the lwl, lwr, swl, swr instructions look like they were designed for this purpose.

joeldipops avatar May 15 '19 12:05 joeldipops

Ok in cen64: ~~I bound A to a register, but it's slower. If I comment out the binding and just leave it as a global var, it's faster than if bound.~~

~~But if PC bound to a register it's faster than otherwise.~~

~~Obviously need to test on the real thing, but it's cool that cen64 does change depending on this and I think it will probably match.~~

Ok, I just had a bug that was slowing things down. Binding to A does improve things afterall.


~Having tested on the 64, I don't understand what's going on. If I bind PC to a register, the performance improves slightly, but if I bind A to a register, it drops significantly, even if I put PC back the way it was so there's only one bound register. Will go and learn assembly and try one more time with that, but not until after I've got Tpakio working. Unless some random genius stumbles across this and can explain why using a register is slower than not using one..........~

With my derp fixed above, binding A & PC works out about the same as just binding PC as far as I can tell. But I think my everdrive has been playing up recently, so could be related.

Also had an idea that some of the weirder performance issues might be related to writing to cart-ram, so will look in to that.

joeldipops avatar May 17 '19 12:05 joeldipops

Discovering compilation optimisation flags got me a bit closer to where I needed to be. Next I'm going to look at fiddling with the MMU a bit.

joeldipops avatar Jun 17 '19 12:06 joeldipops