customasm
customasm copied to clipboard
Endianness directive?
It's not currently possible to specify the endianness of the target architecture, which could lead to confusion for datatypes larger than it's #bits size.
Hmm, do you have an idea for when this would apply, to be exact? Like, in instruction outputs or data directives? I've been trying to avoid this and treating all values as big-endian, letting the user decide how to output bytes by doing the conversion themselves (like you can see in the example 6502 CPU file).
There are some cases for which I couldn't think of a sensible result when doing automatic little-endian conversions, for example doing a #d20 0x12345
data directive in an 8-bit CPU, or even in instruction outputs which often build their values from parts smaller than a byte (like ld {x} -> 0x4 @ x[19:0]
).
I think it's more important for things like the #str directive, which kinda need to be simple, and having to reverse the string beforehand would be annoying.
And, in general @hlorenzi, it'd be more convenient.
How would that work with strings? Right now, they're being encoded in UTF-8, which should be endianness-neutral.
That is a good point, UTF8 is endianness neutral. The real issue I have is that keeping track of endianness is annoying, and more importantly, doing something like
#d32 label
will save the label pointer in big endian even for a little endian system. (I just spent the last 6 hours playing with the assembler by the way. It's great! Only minor complaint is lack of traditional macros, but i'll live.)
i could see this being useful for the #d directive.
an example with the 6502, at address 0xFFFA to 0xFFFF are vectors located that point to certain parts of the code, for resets and interrupts. the CPU uses little endian so the addresses have to be stored lower byte first, currently the way this has to be done is like this:
#addr 0xFFFA
#d8 NMI_HANDLER[7:0] ; NMI Vector
#d8 NMI_HANDLER[15:8]
#d8 INIT[7:0] ; Reset Vector
#d8 INIT[15:8]
#d8 IRQ_HANDLER[7:0] ; IRQ/BRK Vector
#d8 IRQ_HANDLER[15:8]
maybe "#dl" for Little Endian, and "#d" for Big Endian.
I mean technically you could just implement those yourself with the CPU file.
#dl {val} -> val[7:0] @ val[15:8]
I wasn't able to call the command "#dl" as the assembler seems to assume that tokens starting with a # are directives. So I got an "unknown directive" error. But defining
ld16 {val} -> val[7:0] @ val[15:8]
ld32 {val} -> val[7:0] @ val[15:8] @ val[23:16] @ val[31:24]
did the job for me.
Endianness needs to be it's own directive. We should not have to do stuff like {value} => value[7:0] @ value[15:8]
on every single opcode, variant, variable, etc.
end of day, an endianness directive is a pretty important addition, especially now with the ability to work on more complex architectures with v0.11. Having to utilize a workaround is error prone and just plain annoying.
I have no problem with the endian-ness of the text disassembly modes. However, my 32-bit vCPU is little-endian, requiring me to swap every 4 bytes of the binary output file. For now, I've just added a byte-swapped 32-bit output type to my local build of CustomASM, but it would be great to get that as a real feature...mine is just hacked in for my use, rather than an actual feature...
Related to this issue, I've just added an le()
function on v0.11.8, which does little-endian encoding of its argument. It might help in some of the cases discussed here! It's detailed in the changelog!
Can you add support for le() on #d32 directives?
I'm still quite unsure about what kind of behavior a "global endianness directive" should have 😅
It's not very clear to me what types of expressions it should be applied to, and at what point in the process. Only for data directives? But then we don't solve TChapman500's issue.
Perhaps if the le()
function has not managed to alleviate this issue, we should open the discussion again.
@hello01-debug Do you mean something like a #d32le
directive? Where I imagine it applies le()
automatically to every argument, like ProxyPlayerHD's suggestion.
What about a command line argument like little-endian
or big-endian
. Most people are probably using x86-64 processors, which use little-endian. Most programs will therefore use little-endian by default, as it's the natural endian for the processor. But some programs, in spite of this fact, will go against the native processor's endianness, using big-endian instead. A command line argument would allow files to be generated that accommodate this fact.
For when a file must be stored with a specific endianness, a #big_endian
and #little_endian
directive on the first line should solve the issue.
What about a command line argument like
little-endian
orbig-endian
. Most people are probably using x86-64 processors, which use little-endian. Most programs will therefore use little-endian by default, as it's the natural endian for the processor. But some programs, in spite of this fact, will go against the native processor's endianness, using big-endian instead. A command line argument would allow files to be generated that accommodate this fact.For when a file must be stored with a specific endianness, a
#big_endian
and#little_endian
directive on the first line should solve the issue.
Absolutely, those are certainly the way to go! But I'm still unsure about the actual behavior on the code. Applying it to values from data directives seems ok, but when it comes to instructions (as you suggested), it gets a little more confusing.
You mention not wanting to do {value} => value[7:0] @ value[15:8]
for every instruction.
That's already been reduced to {value} => le(value)
as of the current version.
Even so, let's imagine little-endian mode is in effect. What does {value} => 0xab @ value`16
evaluate to? Is the little-endian conversion applied to the (possibly unsized) argument, before being supplied to the expression body? Or is it only applied by the `16
slice? It can't be applied to the whole expression, since that includes endianness-independent things like the opcode, and it would mess up the concatenation order.
If it were applied directly to the argument, then something like {value: u16} => 0xab @ (value + 1)`16
wouldn't work? Because you'd be doing addition on a little-endian number, while the arithmetic operators always expect big-endian numbers. So I'm not sure what's the best way to go about it. What are your thoughts?
You mention not wanting to do {value} => value[7:0] @ value[15:8] for every instruction. That's already been reduced to {value} => le(value) as of the current version.
I did not realize that we could do that.
If it were applied directly to the argument, then something like {value: u16} => 0xab @ (value + 1)`16 wouldn't work? Because you'd be doing addition on a little-endian number, while the arithmetic operators always expect big-endian numbers. So I'm not sure what's the best way to go about it. What are your thoughts?
Good point. Perhaps the user could do something like le(0x42 @ rY`4 @ rX`4) @ le(value`16)
.