bass icon indicating copy to clipboard operation
bass copied to clipboard

surprising behavior when assembling instructions dependent on operand bitlength

Open notwa opened this issue 2 years ago • 8 comments

on certain CPUs, like the NES's clone of the MOS 6502, there are redundant instructions for reading and writing the zero page in memory, i.e. any address less than 256. these instructions typically save 1 byte and 1 cycle over their longer counterparts. when assembling these instructions, bass struggles with which opcode to use, sometimes even incorrectly truncating their operands.

consider the assembly code:

arch nes.cpu
output "dummy.bin", create

// the same number in different bases:
constant foo = 1234
constant bar = $4D2

sta foo // 8D D2 04, correct
sta bar // 8D D2 04, correct

so far so good: bass uses the "longer" opcodes for high memory addresses. let's say foo and bar might be the X and Y coordinates of onscreen objects in a game, and we'd like to take offsets in memory for known objects, as this kinda scenario is typically where I run into this issue.

sta foo+1 // 8D D3 04, incorrectly 85 D3 on v19
sta bar+1 // 8D D3 04, incorrectly 85 D3 on v19

(v18 = master branch, commit a847dfc. v19 = devel branch, commit 5617511) bass v19 is truncating the address and dropping the upper byte entirely! bass v18 appears to be doing the right thing, but that's just a fluke — I'll show that later. for now, let's look at how bass treats literals in expressions instead of constants.

// different behavior for different bases when inlined:
sta 1234 // 8D D2 04, correct
//sta $4D2 // 8D D2 04, correct (does not assemble on v18)
sta $04D2 // 8D D2 04, correct
sta 1234+1 // 8D D3 04, correct (but see the next section)
sta $4D2+1 // 8D D3 04, incorrectly 85 D3 on v19

those last two lines had me deeply confused until I stared at the Table::bitlength function in table.cpp (in v19). the inconsistency here is due to atoi and hexLength/binLength acting differently when they encounter the + symbol: atoi stops parsing and ultimately results in the bitlength of the number up to that point. the other methods simply exit and return 0. knowing how this works, we can develop more surprising examples:

// bases are parsed differently:
// Table::bitlength returns 0 on v18, and 8 on v19:
sta 255+2 // 8D 01 01, incorrectly 85 01 on v19
// Table::bitlength returns 0 on both versions:
sta $FF+2 // 8D 01 01, incorrectly 85 01 on v19
// Table::bitlength returns 0 on v18, and 9 on v19:
sta 260-5 // 85 FF, incorrectly 8D FF 00 on both versions
// Table::bitlength returns 0 on both versions:
sta $104-5 // 85 FF, incorrectly 8D FF 00 on v18

now, why does v18 seem to do the right thing more often? well, it's a fluke. in v18, nes.cpu.arch contains this:

sta *16        ;$8d =a
sta *08        ;$85 =a

and v19 contains this:

sta *08        ;$85 =a
sta *16        ;$8d =a

bass prefers whichever entry comes first. in v18, when the bitlength is 0, the first one is always chosen. 0 "fits into" 16, and the $85 entry is never considered. this means that v18 should always produce valid code, but not always the code we expect.

finally, let's look at a subroutine from the disassembly of SMB 1 that's been floating around online (and converted to bass syntax):

constant Block_X_Speed     = $60
constant Block_PageLoc     = $76
constant Block_X_Position  = $8F
constant Block_Y_Speed     = $A8
constant Block_Y_Position  = $D7
constant Block_Orig_XPos   = $03F1
constant Block_Y_MoveForce = $043C
// [snip]
SpawnBrickChunks:
lda Block_X_Position,x
sta Block_Orig_XPos,x
lda #$F0
sta Block_X_Speed,x
sta Block_X_Speed+2,x
lda #$FA
sta Block_Y_Speed,x
lda #$FC
sta Block_Y_Speed+2,x
lda #$00
sta Block_Y_MoveForce,x
sta Block_Y_MoveForce+2,x
lda Block_PageLoc,x
sta Block_PageLoc+2,x
lda Block_X_Position,x
sta Block_X_Position+2,x
lda Block_Y_Position,x
clc
adc #$08
sta Block_Y_Position+2,x
lda #$FA
sta Block_Y_Speed,x
rts

as it is, neither bass v18 nor bass v19 produce matching code from the original game. the simplest workaround for both versions is to add < and > characters to each operand (not the constants) to override their bitlengths to be 8 and 16 respectively, but this still requires some mental bookkeeping, as well as Find+Replace if I decide to move an address in or out of zero-page.

PS. thank you to everyone for maintaining bass over the years. I hope my write-up doesn't come across as overly critical; I'm just being verbose in hopes of saving anyone else the confusion. plus, there are a lot of potential test-cases here.

notwa avatar Jul 01 '22 15:07 notwa

Thanks for your issue & great explanation, yep I have wanted to fix this to work correctly/sanely for many years, I only just got round to making progress for this in the devel branch for v19 recently. As you have a good understanding of it I would love it if you could help out to fix it! otherwise I will try to make it work correctly myself so it is fixed for v19. Thanks for looking into this & thanks for your kind words =D

PeterLemon avatar Jul 02 '22 01:07 PeterLemon

with a bit of hacking, I've gotten that SMB example (and most of SMB) to assemble correctly. my changes are essentially:

  • remove const from Table::bitLength function signature (required for using evaluate, but see later note)
  • only return the measured length when it's nonzero (i.e. no error)
  • new decLength function for computing bitlength of decimal-encoded integers (however, my implementation is hideous)
  • run evaluate when all else fails, then take the log2 and all that stuff.

note: assuming evaluate cannot actually modify the state of the assembler (I don't know bass well enough to tell if this is true), I suggest that each argument is only evaluated once, and then cached in a vector for all subsequent uses. this would go in Table::assemble and allow Table::bitLength to remain a const method. then, the switch-case on Format::Type would cast and modify the cached values appropriately, instead of calling evaluate individually/directly.

however, there is still a problem. in the SMB disassembly, there is this code:

constant SwimTileRepOffset = PlayerGraphicsTable + $9E

// [snip]

PlayerGraphicsTable:
//big player table

// [snip]

lda     Sprite_Tilenumber+24,y   // check tile number of seventh/eighth sprite
cmp     SwimTileRepOffset        // against tile number in player graphics table
beq     ExPGH                    // if spr7/spr8 tile number = value, branch to leave

this now fails very subtly. every jump to any label after that code will be off by 1, and no warning/error is emitted. the problem is that bass doesn't yet know the address of PlayerGraphicsTable when executing the queryPhase. instead, bass opts to substitute it with pc() as a dummy value [1] [2]. this only kinda works. here, the value of SwimTileRepOffset is 0 + $9E during queryPhase and $6E17 + $9E during writePhase. then, bass incorrectly opts to use the 8-bit instruction during queryPhase and the 16-bit instruction during writePhase, creating the discrepancy in subsequent label addresses.

I've thought about a few workarounds. in assembly, either:

  • use cmp >SwimTileRepOffset or cmp.w SwimTileRepOffset to override bitLength (I'm doing this for now)
  • just inline PlayerGraphicsTable + $9E or use a define, since it's properly defined at the point it's actually used

on the C++ side, either:

  • treat constants similar to defines and evaluate them as lazily as possible (I'm not sure if this would really work)
  • prevent this from happening by asserting that all unknowns (pc()s) are resolved at the time of use (requires additional bookkeeping, definitely breaks people's existing code)
  • monitor all constants and do multiple queryPhase executions until they stop changing or something breaks (no idea about this one)
  • implement some sort of solver for all PC-dependent numbers, potentially with multiple solutions (overkill for this example, requires additional bookkeeping, could kill performance in macros like this)

I think the last option is the "correct" one, since it would work even if PlayerGraphicsTable is defined at the very end of the file, but it would be an undertaking. maybe it's better to just let the user figure it out.

I'm curious to know which choice would be most in the spirit of bass. I'm somewhat concerned about breaking existing code that used to assemble "fine" in previous versions of bass, but I don't know if that's a concern to the project itself. I've definitely heard about people holding onto very old versions of bass, so maybe it doesn't matter? ┐(´ω`)┌

also, I'm looking for more elegant alternatives to my decLength function, since my implementation is incredibly over-engineered.

notwa avatar Jul 05 '22 19:07 notwa

Really wonderful work, I am blown away!! I have wanted to get things stable like this for so long, & I am so happy with your progress & insight into what needs to be fixed. As for the spirit of bass, Near would have wanted this to become possible, as many new arches need this stability to function correctly, so I would not worry too much about breaking some things that were written for V14. The label jump stuff you have indicated in the last part is another thing I really wanted to fix so I am happy with any route you can find to work around this. I am eager to push any pull requests you make, & feel you are becoming a major contributor to the team. Thanks so much for all your hard work & helping bass become the assembler it needs to be =D

PeterLemon avatar Jul 05 '22 21:07 PeterLemon

I went and implemented that caching behavior described in my last post. I'm not sure how much it helps in realistic scenarios, but it should be a little faster and much more hygienic than what I had before. it miiiiiight still be possible to trigger unwanted side-effects from operands that are ultimately unused, depending on the active arch file. however, it seems like you would have to craft a new architecture specifically to trigger that behavior — uh, I'm overthinking it…

anyway, I think my patch-1 branch will become a pull request soon, since the other issue I described with pc() should probably be tackled separately (it affects more than just zero-page instructions).

notwa avatar Jul 08 '22 08:07 notwa

no wait, I just realized the code I wrote was really dumb. the problem is that args is potentially tokenized differently for each opcode, and I'm trying to cache indices into args regardless of the strings those indices actually refer to. a hash table would be much better, or simply what I had prior to that commit.

I think if I want this to be foolproof, I'll need some way of making temporary changes to the global assembler state. this might be necessary for tackling the pc() issue as well, so…

notwa avatar Jul 08 '22 08:07 notwa

Thanks for the update on your work & looking into this issue, really appreciate the work you are putting into this =D

PeterLemon avatar Jul 09 '22 13:07 PeterLemon

I think I've got something working, but the code has gotten a little unwieldly, so it'll still take a little time to boil it down. plus, there are still a bunch of edge cases I need to consider.

this amounted to combining my third idea (doing multiple passes) with some bookkeeping (like my second idea) to help prevent (not foolproof) these changes from causing more surprising behavior down the road.

notwa avatar Jul 15 '22 22:07 notwa

very nice, thanks for keeping me updated :)

PeterLemon avatar Jul 16 '22 04:07 PeterLemon