rgbds
rgbds copied to clipboard
[Feature proposal] Generalized charmap feature
Considering the various proposals related to the charmap feature (#97, both in #256, and a few comments I've heard occasionally), I figured I'd draft up an idea that would cover all of those use cases and also support the advanced character-to-bytestream mapping I've seen in other games. Note: this is significantly more complex than the current system. However, it's backwards-compatible and all the features are opt-in by design, meaning that it wouldn't interfere with current projects, which could thus start using whatever features they need.
Charmap declarations would have up to five arguments, like so:
charmap string, value, length, new state, current state
All arguments other than the first two are optional. All arguments other than the first one are numeric.
The string argument would have the same meaning as it currently does. The value argument would have the same semantics, but without being limited to one byte; it would be an integer like any other (which means 32-bit under the current implementation of integers).
The length argument indicates how many bits of value are actually output; it can take values between 0 and the number of bits in an integer (i.e., 32), and defaults to 8 when omitted. The lower length bits of value should be emitted as a replacement for the string; for the default value of 8, that means outputting the low byte of value (i.e., the current behavior).
It would be acceptable to only support multiples of 8 for the length argument — it is still worthwhile to specify the value in bits to allow for future bit-granular backwards-compatible extensions, even if they aren't currently supported. (If you're pondering about the utility of emitting bitstrings, consider the Huffman text compression we implement in Prism, where each character results in 3-21 bits. This is currently handled by a build utility that transforms some strings in .asm files into compressed data, but it could be handled perfectly well by a charmap implementation such as the one I'm describing here.) Note that a length of 0 is explicitly allowed, in which case value is ignored and nothing is output.
$00 bytes that appear in the output (for instance, as a result of a length being too high) shouldn't terminate the string; the current documented behavior causing this can only be considered a defect.
Regarding the last two arguments for charmap, let me introduce the concept of a shift state. This is a concept almost as old as binary encoding itself (the 117-year-old Baudot-Murray code uses it) used to extend the numberspace of the encoding without using wider values.
For the implementation I'm suggesting, shift states would be identified by integers. These integers wouldn't have any special meanings, with the exception of 0 representing the default state. What this means is, any string begins at shift state 0.
Any charmap declaration which doesn't omit the fourth parameter (the new state parameter) would trigger a change of shift state to the value indicated in the fourth parameter. Any charmap declaration which doesn't omit the fifth parameter (the current state parameter) would only apply to the shift state indicated in that parameter, being ignored (as if it didn't exist) in other shift states. This would allow easy inline character map switching, far more powerful than the multiple character set idea suggested in #256.
For a simple example of how this would work, assume that some potential game used a character set based on ASCII, but including the following control characters:
- Font color: black ($10), blue ($11), green ($12), red ($13)
- Font type: normal ($14), bold ($15), italic ($16)
- Miscellaneous: new line ($0a), clear textbox ($0c), beep ($07)
For the example, let's assume that the game in question doesn't use braces, meaning we can use those characters as formatting controls. The control characters above could be represented easily as follows:
charmap "{", 0, 0, 1 ;braces are used to introduce formatting control characters
charmap "}", 0, 0, 0 ;and thus shouldn't be emitted themselves, but rather shift state
; ignore whitespace and commas within format declarations
charmap " ", 0, 0, 1, 1
charmap ",", 0, 0, 1, 1
; declare actual format specifiers
charmap "black", $10, 8, 1, 1
charmap "blue", $11, 8, 1, 1
charmap "green", $12, 8, 1, 1
charmap "red", $13, 8, 1, 1
charmap "regular", $14, 8, 1, 1
charmap "bold", $15, 8, 1, 1
charmap "italic", $16, 8, 1, 1
charmap "reset", $1014, 16, 1, 1 ;reset formatting (regular, black text)
charmap "clear", $0c, 8, 1, 1
charmap "newline", $0a, 8, 1, 1
charmap "beep", $07, 8, 1, 1
; note that shift state 0 contains no declarations and thus defaults to ASCII for all characters
Under such declarations, we could write a string like: "This is regular text, not bold, and {red}this is red text.{newline, blue}This is blue text, {italic}not green.{newline, reset}And here it {beep}beeps.{newline, newline}Make sure to use {red, bold}red and bold text{reset} for warnings and important stuff."
While I can't add color formatting to GitHub issues, the result should be rather obvious. Note that the word "bold" in the beginning doesn't make text bold, because it's in shift state 0 (the default), and the charmap "bold", $15, 8, 1, 1 line only applies in shift state 1. (It would be ideal to be able to disable the default ASCII charmap for shift state 1 in this case, but I'll leave that for a future proposal if this is ever implemented.)
I hope this is not too complex to be understood. Feel free to ask any questions you might have. And as always, I don't expect something large like this to be implemented in the short term; it's just a proposal for the future.
I just want to mention this because I don't think anyone has realized this yet, but if you define a charmap outside of any sections, it's a global charmap applied to everything. If you define a charmap within a SECTION, it applies that charmap to just that section. So while technically there's no "remove global charmap for this section" function, you can just charmap " ", $20 (space is $20, right?) and none of the other global maps will apply to that section. I guess that is kinda sloppy though.
Oh, I didn't know that. That's something that has to be documented, it's good to know it can be done.
Can this be considered superseded by #403?
Not really: shift states are inline, and thus serve a different purpose. But there's no urgency here; it's just a "nice to have" for some day.
At least I believe there is some overlap, right?
There is.
If so, could the first post be edited to remove the overlap and, hopefully, fit better with the newly-added charmap syntax?
Given the complexity of this, I'm not sure it's a good idea to implement it within RGBASM. Further, with the introduction of CHARLEN and CHARSUB (#787), and perhaps using #97, it should be possible to instead write a macro pack that processes shift states like described.
Once such a macro pack is written, depending on its complexity and performance, we'll consider making it built-in, similar to #98.
Sounds good?
(That would not preclude adding simpler wide charmap support from #97, such as charmap "ě", $01, $1b or charmap "ě", $011b, 2.)