sjasmplus
sjasmplus copied to clipboard
T-state counting
T-state counting similar to zmac. Example of usage here Example:
code: ld a,5
ld (hl),a
inc a
inc hl
ld (hl),a
cost equ t($)-t(code)
One of the first things I've done when I restarted writing assembly code in 2013 was to create a t-states counting library (it was very basic, but would be able to deal with the task as above). Unfortunately, since then I grew to recognize that a feature like this does not belong to assembler, or to be more precise, would be mostly useless in assembler, esp. for an assembler that attempts to be multiplatform, as sjasmplus. The reason for this is very simple:
- How do we deal with conditional execution? For any code that branches "the execution time" is not something defined uniquely.
- Depending on a particular platform, the number of t-states for specific commands can vary. E.g. ZX Spectrum clones with M1 delays, effectively, round up the execution time of commands with odd number of t-states to become even. On Amstrad CPC the executions times are all rounded up to the nearest multiple of 4 t-states (I do not know the details about this, so apologies if I am not stating this fully correctly). My point is, if your assembler makes a promise to compute something like this, you open a real can of worms where people from different platforms would require different timing profiles.
- Depending on a particular platform, the number of t-states for specific command can depend on precise timing of the command (I am thinking about ULA delays on ZX Spectrum, but I am sure there are other situations too).
All in all, if you want accurate timings, this a job for a good emulator for your platform. There are assemblers that can invoke emulators as part of their workflow (I know of at least one), but this is a very substantial commitment and redesign, which is difficult to justify here.
I recently wrote a piece of code like that, but the durations are given in "NOPs" for the Amstrad CPC. By updating the arrays in the source you could get the T-States, I suppose :
By the way, any advise on this code is welcome ;-)
;==============================================================================
; TIMINGS_TICKER
;------------------------------------------------------------------------------
; Measures the duration of a snippet of code
; Conditional jumps are counted like if they where false (i.e, no jump)
;------------------------------------------------------------------------------
;------------------------------------------------------------------------------
; INPUT:
; Start of the code to measure
; End of the code to measure
;------------------------------------------------------------------------------
; OUTPUT :
; A label TICKER is set (or overwritten), containing the duration of the
; code, in NOPs
;==============================================================================
macro TIMINGS_TICKER start,stop
LUA PASS3
nops={3,2,2,1,1,2,1,1,3,2,2,1,1,2,1,3,3,2,2,1,1,2,1,3,3,2,2,1,1,2,1,2,3,5,2,1,1,2,1,2,3,5,2,1,1,2,1,2,3,4,2,3,3,3,1,2,3,4,2,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,2,2,2,2,2,2,1,2,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,1,1,1,1,1,1,2,1,2,3,3,3,3,4,2,4,2,3,3,0,3,5,2,4,2,3,3,3,3,4,2,4,2,1,3,3,3,0,2,4,2,3,3,6,3,4,2,4,2,1,3,1,3,0,2,4,2,3,3,1,3,4,2,4,2,2,3,1,3,0,2,4}
nops[0]=1
bytes={3,1,1,1,1,2,1,1,1,1,1,1,1,2,1,2,3,1,1,1,1,2,1,2,1,1,1,1,1,2,1,2,3,3,1,1,1,2,1,2,1,3,1,1,1,2,1,2,3,3,1,1,1,2,1,2,1,3,1,1,1,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,3,3,3,1,2,1,1,1,3,0,3,3,2,1,1,1,3,2,3,1,2,1,1,1,3,2,3,0,2,1,1,1,3,1,3,1,2,1,1,1,3,1,3,0,2,1,1,1,3,1,3,1,2,1,1,1,3,1,3,0,2,1}
bytes[0]=1
ednops={2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,4,4,4,6,2,2,2,3,4,4,4,6,2,4,2,3,4,4,4,6,2,2,2,3,4,4,4,6,2,4,2,3,4,4,4,6,2,4,2,5,4,4,4,6,2,4,2,5,4,4,4,6,2,4,2,4,4,4,6,2,4,2,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}
ednops[0]=2
edbytes={2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,2,4,2,2,2,2,2,2,4,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2}
edbytes[0]=2
timings_debug=_c("TIMINGS_DEBUG")
start=_c("start") % 65536
stop=_c("stop") % 65536
if(stop<start) then
stop=stop+65536
end
ptr=start
cptnops=0
while(ptr<stop) do
byte=sj.get_byte(ptr)
if(nops[byte]==0) then
if(byte==0x0ED) then
edbyte=sj.get_byte(ptr+1)
cptnops=cptnops+ednops[edbyte]
ptr=ptr+edbytes[edbyte]
else
sj.warning(string.format("TIMINGS_TICKER : This extended opcode is not yet managed : #%02x #%02x",byte,sj.get_byte(ptr+1)))
end
else
cptnops=cptnops+nops[byte]
ptr=ptr+bytes[byte]
end
end
if(ptr~=stop) then
sj.warning("TIMINGS_TICKER : Stop value doesn't point to the byte following an opcode")
end
sj.insert_define("TICKER",cptnops)
ENDLUA
endm
There's a problem with this code : You can't use this value later in the code without raising warnings I mean :
startA:
ld a,2
stopA:
TIMINGS_TICKER startA,stopA
dup (64-TICKER) ; This code should ensure my routine lasts for exactly 64 NOPs
nop
edup
this code produces the right opcodes (#3E #02 followed by 62 #00) but raises a lot of "warning: Label has different value in pass 3'
the Lua script looks quite good (just incomplete with regards to IX/IY and bit instructions if I understand it correctly from quick read). I would probably define it as function during PASS1 and then just call it in PASS3 whenever needed to avoid all of this being defined/processed multiple times for each block.
the problem with later use is unfortunately hard limit of the sjasmplus design, the code to assemble and thus addresses between second and third pass should be same, thus reading the code in third pass and adjusting by that is breaking this principle. If you really know what you are doing and you don't care about the warnings, you can still sometimes get what you asked for in the binary, but the "correct" way to do this would be to assemble these independently, first assemble to binary blob the inner part which will be aligned, then in new asm file and new assembling process incbin that... oh wait, that reads actual bytes still in pass3 .. hmm.. so you would have to read the file by lua io in each pass (or first pass) and calculate the T-states and then have the DUP padding fixed for every pass with same result.
Or produce the inner code in first assembly and during its third pass generate the timing info and export
it to small include file for second assembling, which can then just incbin
+ include
these two files and use the values without running the timing counting at all -> that's probably the most "sjasmplus" way fitting the assembler architecture well.
In other words this type of task is not a good-fit for sjasmplus, and there's no simple fix/improvement to get there.
If you don't want to go there splitting the assembling into two, you can avoid some of those warnings by hard-padding the DUP block with like ORG
(or anti-dup with remaining nops to have 64 in total) after it so no matter how long it is, there's only a bit of code not defining new labels following it doing the important stuff, probably jumping to next code at end of it, then ORG
/anti-dup makes sure that each pass follows at same address after this piece of code no matter what are the timing results and avoid changes in pass2/pass3 addresses.
I think this nicely illustrates why it would be tricky to add it into sjasmplus, and to satisfy all possible use-cases, so I'm still not even considering that. I'm somewhat open to the idea of adding T-states printed into listing file (under some option), but not planning to do that in near future, and it would not help use cases like this, would just make manual counting a bit simpler. In my ideal world I would rather see more tools doing their part of work, like z88dk ticks
emulating the code and counting T-states including loops/etc (but that does not simulate ZX contention, only Z80 timings), and then some IDE integrating all this and more, like using AI to generate possible unit tests for newly written code, using something like ticks to measure it's performance and run unit tests and produce hints about results produced by code while typing, etc... sadly there's no such dream IDE at all right now, and it's unlikely I will ever write it. :)