GBADotnet
GBADotnet copied to clipboard
Improve prefetch unit until it passes all mgba timing tests
1234/2020 passing as of today with timers working fairly well. IWRAM tests pass each time so the instructions themselves are about right. Likely to be issues with prefetch unit and other issues that I don't know about yet
Fixes for DMA to share wait states with CPU have caused massively broken DMA tests
needs investigation
I've fiddled around with this a bunch tonight and got to 1341 (have had it higher but probably by accident). Specifically now the standard NOP tests all time correctly w/ prefetch (where they shouldn't activate it because there's no spare bus cycles)
The LDRH tests look two cycles high on prefetch activated versions which is fairly consistent across.
The key learning here that fixes prefetch is that prefetching is always sequential (obviously) and pays no attention to the state of the SEQ signal from the CPU if a prefetch is already occurring. Not at all surprising but needed fixing nonetheless.
The remaining issues can be grouped as:
- Thumb multiplication (and follow on thumb bios timings) - suspect straightforward bug in thumb mul use of SEQ.
- DMA - screw DMA timings
- LDMIA across boundary of ROM - edge case which I would have expected to fall out but definitely doesn't
that's an example of DMA timings
Multiplication failures, implies that the timing is off entirely, nothing to do with prefetch but only for thumb. Which is odd since they use the same code!
Multiplication issues are resolved, they were caused by incorrect operand ordering for masks.
BIOS calls from thumb are not resolved though, so presumably there's something in there about switching from thumb/arm and back.
Fix bios timings by clearing the prefetch unit when the pipeline is cleared
118 tests left to go, I haven't counted but I think the two cases left are:
- LDMIA across ROM boundary (maybe just a case of checking when SEQ is set during LDMIA? Or possibly what order the LDMIA loads happen in although I'd have hoped that was already right)
- DMA - Likely two issues here,
- Should take 2 cycles in certain cases (particularly around iwram?) - egregious timing differences!
- Various times I take less cycles than expected but hard to pin down exactly when.
The large number of "2 cycle" failures are because the CPU should have managed to read the timer value before getting blocked by DMA because the code is in IWRAM and so it taking very little time to run.
Trouble is that I thought that I'd nailed down the exact blocking behaviour for other tests!
Rough plan to solve the DMA/CPU issue is to properly emulate the bus owner at any given cycle. That means a new flag on the bus and DMA setting/unsetting it depending on what state it's in. Not doing it now because it's late but at worse it's a slightly indirect way of writing the same code I have now. Hopefully it fixes this without breaking any other tests!
Hot damn was that ever really tedious to figure out.