rgbds icon indicating copy to clipboard operation
rgbds copied to clipboard

Traverse the charmap trie to reconstruct mappings instead of saving them

Open Rangi42 opened this issue 1 year ago • 5 comments

There's a fundamental issue with this approach that makes me not want to use it: it loses the order in which charmaps were defined, instead outputting them in the ASCII/UTF-8 byte order. (Even the mapped-value order would be more meaningful than that.)

Besides that, I still need to measure the performance benefit of not caching the mappings at definition time. I suspect that it was low; and the performance hit when actually making the dump file (admittedly a rare action) is visible.

Rangi42 avatar Aug 22 '24 17:08 Rangi42

I suppose we could maintain the output order: Instead of calling charFunc right away when we find a terminal node, instead hashmap the node index to the mapping string; then after traversing the whole trie, iterate over this hashmap. Of course, that would be even more of a performance hit when writing a state dump file, but if it saves significant performance during regular assembly, it would be worthwhile.

Rangi42 avatar Aug 22 '24 17:08 Rangi42

Time to build pokecrystal (average of the middle 3 of 5 runs): 4.82s before this PR, 4.81s after it.

time ./rgbasm -s char:- sample.asm: 0.005s before this PR, 0.006s after it.

Seems immaterial either way. @ISSOtm What do you think?

sample.asm:

charmap "a", 1
charmap "b", 2
charmap "c", 3
charmap "d", 4
charmap "ab", 5
charmap "abc", 6
charmap "abcdef", 7
charmap "bcdefg", 8
charmap "cdefgh", 9

charmap "\0", 1
charmap "\0\0", 2
charmap "\0\0\0", 3

charmap "...", 1
charmap "<PLAYER>", 2
charmap "<PK>", 3
charmap "<LINE>", 4
charmap "<WATASHI>", 5

charmap "☎", 1
charmap "円", 2
charmap "カ", 3

newcharmap ascii
def PRINTABLE_ASCII equs " !\"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz\{|}~"
for i, strlen("{PRINTABLE_ASCII}")
	charmap strsub("{PRINTABLE_ASCII}", i + 1, 1), i + $20
endr
charmap "\t", $09
charmap "\n", $0a
charmap "\r", $0d

charmap "{PRINTABLE_ASCII}", 1, 2, 3

Rangi42 avatar Aug 22 '24 18:08 Rangi42

If the runtime is on par, how about the memory consumption?

ISSOtm avatar Aug 22 '24 18:08 ISSOtm

The maximum resident size (/usr/bin/time -v) was 30.55 KB before this PR, and is 30.52 KB after it.

It really seems not to matter, so, just a question of which aesthetic we prefer. :P

Rangi42 avatar Aug 22 '24 19:08 Rangi42

traverse being a recursive function is technically a problem, since it can overflow the stack if your strings are long enough:

def s equs "x"
for n, 16
redef s equs "{s}{s}"
endr
charmap "{s}", 42

Edit: This is fixed by traversing iteratively instead. (Still have to check again if the performance is affected.)

Rangi42 avatar Aug 22 '24 20:08 Rangi42