sjasmplus icon indicating copy to clipboard operation
sjasmplus copied to clipboard

Feature suggestion: Pseudo-op to know the "size" of a label

Open sdsnatcher opened this issue 6 years ago • 10 comments

There are many places where we need to know the how many bytes were used under a label. For example, when we LDIR routines, or need to loop over a string.

Normally, we have to add extra labels to mark where the end is, and subtract endlabel-LABEL to find the size. But this ends up in the creation of a plethora of nearly useless labels that only clutter the symfiles and the debuggers.

It would be much easier if we had a pseudo-op like SIZEOF(label) that would count how many bytes that label has until the next label at the same or higher level is found.

For example:

FOO:
	ld	a,3
	call	BAR
	jr	c,.skip
	ld	a,1
	ret

.skip:	ld	a,2
	ret

BAR:
	cp	5
	ret

INSTRAM:
	ld	hl,FOO
	ld	de,MYRAM
	ld	bc,SIZEOF(FOO)
	ret

In the example above, SIZEOF(FOO) would return 13. Allows an easy LDIR to another location without hassle.

It would be useful for strings too:

CHKCHARS:
	ld	a,(MYCHAR)
	ld	hl,.charlist1
	ld	bc,SIZEOF(.charlist1)
	cpir
	ld	a,1
	ret	z
	ld	a,(MYCHAR)
	ld	hl,.charlist1
	ld	bc,SIZEOF(.charlist2)
	cpir
	ld	a,2
	ret	z
	ld	a,(MYCHAR)
	ld	hl,.charlist1
	ld	bc,SIZEOF(.charlist3)
	cpir
	ld	a,3
	ret

.charlist1:	db	"ABCD"
.charlist2: 	db	"EFG"
.charlist3:	db	"HIHKLMNO"


BAR:
	call	PRTCHAR
	or	a
	ret	z
	ld	a,9
	ret

sdsnatcher avatar Dec 19 '18 21:12 sdsnatcher

But this ends up in the creation of a plethora of nearly useless labels that only clutter the symfiles and the debuggers.

vs

It would be much easier if we had a pseudo-op like SIZEOF(label) that would count how many bytes that label has until the next label at the same or higher level is found.

This is like contradicting itself, because in the first example SIZEOF(FOO) == (BAR-FOO), and you can't get rid of BAR label, because it's used by SIZEOF(..) if defined like this.

To unclutter symfile, it would make more sense to make SIZEOF not rely on the next label. How about adding marker :: into source code at point where you want the "end of label" happen. i.e.

FOO:
	ld	a,3
	call	BAR
	jr	c,.skip
	ld	a,1
	ret

.skip:	ld	a,2
	ret

BAR:
	cp	5
	ret
        ::

INSTRAM:
	ld	hl,FOO
	ld	de,MYRAM
	ld	bc,SIZEOF(FOO)
	ret

Then SIZEOF(FOO) would be equal to INSTRAM-FOO and SIZEOF(BAR) would be INSTRAM-BAR (i.e. the :: would work for all previous labels up to another :: marker).

In current version the :: will parse as the : instruction delimiter twice, creating effectively "empty instruction" (no error/warning or binary output change). And it can be added also to instruction on the same line, i.e. in the example above the marker :: can be added like ret :: if one prefers that.

For the strings this may then look like:

.charlist1:	db	"AB", 27, 1   ; multi-line string with extra chars
                        db	"CD"    ::
.charlist2: 	db	"EFG"  ::
.charlist3:	db	"HIHKLMNO"  ::

Another option is to make SIZEOF work till EOL, which would make it useless in case of code copying, only single-line strings/defb blocks would be meaningful.

As (next_label - label) defined it doesn't appeal to me personally that much (that I would want to work on it). What's your feel/opinion about such modification?

ped7g avatar Mar 14 '19 10:03 ped7g

Feedback from Busy: If defined by ::, it will not work in "nested" way, while the original proposal thanks to the label depth does work in "nested way", i.e.

helloworld:
.hello db "hello"
.world db "world"

would have 10 == sizeof(helloworld) && 5 == sizeof(.hello) && 5 == sizeof(.world)

I personally don't mind non-nested version of sizeof, but this is surely interesting point.

ped7g avatar Mar 14 '19 11:03 ped7g

And one more question. If the definition is (from original post)

would count how many bytes that label has until the next label at the same or higher level is found

Does it mean "next" in source-way, or "next" in memory address? I.e. what should sizeof do with this:

        org     $8000
lab1:   db      1, 2, 3, 4, 5, 6, 7, 8, 9, 10
lab2:   ret
lab1X:  equ     $8004
        ld      a,SIZEOF(lab1) ; is this 4 or 10?

EDIT: it must be "source" way, i.e. ld a,10. The "address" way may happen by accident when somebody will do initHitPoints equ $8004 in different part of source, without realizing it is also affecting the sizeof(lab1), so this question is resolved.

ped7g avatar Aug 25 '19 09:08 ped7g

Hi, one thought from my side.

In general I like the idea of a size operator. It can be quite handy for strings or LDIR and I also used the pattern: size = end_label-start_label a lot.

But I also use another pattern quite often to check for boundary overwrites:

lab1:
  db 1, 2, 3, 4, 5
  db 0 ; WPMEM

To admit, this is very specific to my own usage: When using it with the z80-debug (vscode extension), z80-debug uses some keywords in the comment part of the list file. It checks e.g. for "WPMEM". (Please look for "WPMEM" in https://github.com/maziac/z80-debug/blob/master/documentation/Usage.md)

For each found "WPMEM" automatically a watchpoint is added that stops execution whenever a read/write to that memory location happens.

This way I "waste" one byte of memory for the benefit to easily find any out-of-bounds access to label "lab1".

Long story short: For this:

lab1:
  db 1, 2, 3, 4, 5
  db 0 ; WPMEM
nextlabel:

and the original proposal, "size" would be one byte too big as it would contain also my "wasted" guardian byte.

So I would support the proposal with the double "::".

maziac avatar Aug 26 '19 09:08 maziac

The :: can be also extended with :.: to "end" counting on local-label level...

But overall this feels to me like getting too complicated, I'm really afraid in assembly these things are sort of too high-level, just adding extra syntax complexity for very small benefit while writing the code and never fitting all use-cases well.

Then again using SIZEOF(label) sounds less error-prone in case you will for example rearrange several strings, in classic way you must fix also all the length (nextlabel-previouslabel) calculations, while with SIZEOF(..) you can move the definitions up/down without worry. So I'm not strictly against, I have just difficult time to see which way of behaviour will work best, to prevent further confusion and unexpected problems. (in terms of the previous source/memory question the "obvious" answer is source-only, as you can get symbol with particular value as part of math expression, not realizing it does point between regular memory labels... but it took me whole day to realize the address-based approach is completely broken :) ... well, better late than never).

ped7g avatar Aug 26 '19 16:08 ped7g

But I think this is actually getting somewhere... as the :: and :.: can extend the original proposal (you can simply use only labels, if you don't like those extra operators, and in normal asm sources the chance somebody has :: by accident is basically zero - if somebody does, he should clean up his source code, or for stylistic ascii-art reasons use comments ... :)

I.e. for source:

L1:     db  "abc"
.locL1  db  "d"
L2:
.locL1: ds  10 :.:
        db  0       ; WPMEM
.locL2: ds  5 ::
        db  0       ; WPMEM

the results would be:

4 == SIZEOF(L1)
1 == SIZEOF(L1.locL1)
16 == SIZEOF(L2)
10 == SIZEOF(L2.locL1)
5 == SIZEOF(L2.locL2)

Seems a bit hairy to explain, but if somebody wants to code in Assembly, they probably have seen already worse... ???

edit: for completeness of the design, taking down a note: the module/endmodule will work as :: automatically. Modules are about encapsulation, so things like sizeof shouldn't leak across, that doesn't make sense (to me at least). edit2: although sizeof(module_name) sounds quite interesting, but that's even above the :: level.

edit3: also org/disp will probably work as :: (org highly likely, as it allows to go backwards, disp actually maybe not, because that can be used to prepare the code which will be later relocated by some ldir, where such sizeof may be useful ... needs some more research and use cases to confirm).

ped7g avatar Aug 26 '19 16:08 ped7g

Makes sense to me. Would be a nice feature if it works as you explained. The example would very well fit my coding style.

maziac avatar Aug 27 '19 10:08 maziac

few more notes about possible implementation: source-based deduction should be probably applied also in case of macro expansion/includes, treating those as non-label instruction (ignoring any local or global labels defined by macros, or inside the included file).

But then if org was used inside included file, it should probably invalidate any "counting" label from upper file, i.e.:

Label:
    include "other_code_with_org.i.asm" ::
    ld bc,SIZEOF(Label)  ; <-- error, can't compute size of Label
Label2:
    include "trivial_code_with_labels.i.asm" ::
    ld bc,SIZEOF(Label2) ; OK, measuring total size of "trivial code", ignoring the labels inside it
    ; the labels inside the include have their own sizeof, ending at EOF at latest
Label3:
    ten_byte_macro_defining_global_label_after_five_bytes ::
    ld bc,SIZEOF(Label3) ; OK, bc = 10 (ignoring the global label defined inside macro)

con: this makes impossible to work with "counting" labels inside macro, like writing macro for "::", but I can't imagine very meaningful usage of that, keeping everything "source-based"/"visual" seems more natural to me at this moment

Q: what about conditional assembly?

Label:
    db "abc"  ; 3 bytes
    IF (false) :: ENDIF
    ; ^ that syntax should work, because "::" is also instruction delimiter,
    ; not just sizeof "counting" stopper, i.e. ENDIF will be found and assembled.
    db "d" ; 4th byte
    IF (false)
Label2:   ; this one will not assemble into regular labels
    ENDIF
    db "e" ; 5th byte
    IF (true) :: ENDIF
    ld bc,SIZEOF(Label) ; 3, 4 or 5?

From implementation point of view the result "5" is probably most logical and easier to obtain. From my human point of view this seems ok and logical too, although a bit tricky to read, but seems to me like natural complexity stemming from usage of conditional assembling.

ped7g avatar Aug 28 '19 08:08 ped7g

IMHO, the :: extension is useful, but it should be optional. It will be used only when the programmer wants to artificially shorten the label size for some specific reason (like debugging).

The programmer can even add some IFDEFs to only have the :: added when needed, like

lab1:
	db 1, 2, 3, 4, 5
 IFDEF DEBUG
	::
	db 0 ; WPMEM
 ENDIF
nextlabel:

sdsnatcher avatar Aug 30 '19 15:08 sdsnatcher

Would be fine with me, as well.

maziac avatar Aug 31 '19 06:08 maziac