customasm icon indicating copy to clipboard operation
customasm copied to clipboard

Endianness directive?

Open moonheart08 opened this issue 5 years ago • 16 comments

It's not currently possible to specify the endianness of the target architecture, which could lead to confusion for datatypes larger than it's #bits size.

moonheart08 avatar Jan 28 '20 14:01 moonheart08

Hmm, do you have an idea for when this would apply, to be exact? Like, in instruction outputs or data directives? I've been trying to avoid this and treating all values as big-endian, letting the user decide how to output bytes by doing the conversion themselves (like you can see in the example 6502 CPU file).

There are some cases for which I couldn't think of a sensible result when doing automatic little-endian conversions, for example doing a #d20 0x12345 data directive in an 8-bit CPU, or even in instruction outputs which often build their values from parts smaller than a byte (like ld {x} -> 0x4 @ x[19:0]).

hlorenzi avatar Jan 28 '20 15:01 hlorenzi

I think it's more important for things like the #str directive, which kinda need to be simple, and having to reverse the string beforehand would be annoying.

moonheart08 avatar Jan 28 '20 15:01 moonheart08

And, in general @hlorenzi, it'd be more convenient.

moonheart08 avatar Jan 28 '20 20:01 moonheart08

How would that work with strings? Right now, they're being encoded in UTF-8, which should be endianness-neutral.

hlorenzi avatar Jan 28 '20 20:01 hlorenzi

That is a good point, UTF8 is endianness neutral. The real issue I have is that keeping track of endianness is annoying, and more importantly, doing something like

#d32 label

will save the label pointer in big endian even for a little endian system. (I just spent the last 6 hours playing with the assembler by the way. It's great! Only minor complaint is lack of traditional macros, but i'll live.)

moonheart08 avatar Jan 28 '20 20:01 moonheart08

i could see this being useful for the #d directive.

an example with the 6502, at address 0xFFFA to 0xFFFF are vectors located that point to certain parts of the code, for resets and interrupts. the CPU uses little endian so the addresses have to be stored lower byte first, currently the way this has to be done is like this:

#addr 0xFFFA
#d8 NMI_HANDLER[7:0]		; NMI Vector
#d8 NMI_HANDLER[15:8]
#d8 INIT[7:0]			; Reset Vector
#d8 INIT[15:8]
#d8 IRQ_HANDLER[7:0]		; IRQ/BRK Vector
#d8 IRQ_HANDLER[15:8]

maybe "#dl" for Little Endian, and "#d" for Big Endian.

I mean technically you could just implement those yourself with the CPU file.

#dl {val} -> val[7:0] @ val[15:8]

ProxyPlayerHD avatar May 24 '20 07:05 ProxyPlayerHD

I wasn't able to call the command "#dl" as the assembler seems to assume that tokens starting with a # are directives. So I got an "unknown directive" error. But defining

 ld16 {val} -> val[7:0] @ val[15:8]
 ld32 {val} -> val[7:0] @ val[15:8] @ val[23:16] @ val[31:24]

did the job for me.

DerULF1 avatar Jul 02 '20 19:07 DerULF1

Endianness needs to be it's own directive. We should not have to do stuff like {value} => value[7:0] @ value[15:8] on every single opcode, variant, variable, etc.

TChapman500 avatar Oct 14 '20 22:10 TChapman500

end of day, an endianness directive is a pretty important addition, especially now with the ability to work on more complex architectures with v0.11. Having to utilize a workaround is error prone and just plain annoying.

moonheart08 avatar Oct 14 '20 23:10 moonheart08

I have no problem with the endian-ness of the text disassembly modes. However, my 32-bit vCPU is little-endian, requiring me to swap every 4 bytes of the binary output file. For now, I've just added a byte-swapped 32-bit output type to my local build of CustomASM, but it would be great to get that as a real feature...mine is just hacked in for my use, rather than an actual feature...

skicattx avatar Mar 12 '21 21:03 skicattx

Related to this issue, I've just added an le() function on v0.11.8, which does little-endian encoding of its argument. It might help in some of the cases discussed here! It's detailed in the changelog!

hlorenzi avatar May 02 '21 02:05 hlorenzi

Can you add support for le() on #d32 directives?

google0101-ryan avatar Mar 18 '22 15:03 google0101-ryan

I'm still quite unsure about what kind of behavior a "global endianness directive" should have 😅 It's not very clear to me what types of expressions it should be applied to, and at what point in the process. Only for data directives? But then we don't solve TChapman500's issue. Perhaps if the le() function has not managed to alleviate this issue, we should open the discussion again.

@hello01-debug Do you mean something like a #d32le directive? Where I imagine it applies le() automatically to every argument, like ProxyPlayerHD's suggestion.

hlorenzi avatar Apr 06 '22 21:04 hlorenzi

What about a command line argument like little-endian or big-endian. Most people are probably using x86-64 processors, which use little-endian. Most programs will therefore use little-endian by default, as it's the natural endian for the processor. But some programs, in spite of this fact, will go against the native processor's endianness, using big-endian instead. A command line argument would allow files to be generated that accommodate this fact.

For when a file must be stored with a specific endianness, a #big_endian and #little_endian directive on the first line should solve the issue.

TChapman500 avatar Apr 07 '22 00:04 TChapman500

What about a command line argument like little-endian or big-endian. Most people are probably using x86-64 processors, which use little-endian. Most programs will therefore use little-endian by default, as it's the natural endian for the processor. But some programs, in spite of this fact, will go against the native processor's endianness, using big-endian instead. A command line argument would allow files to be generated that accommodate this fact.

For when a file must be stored with a specific endianness, a #big_endian and #little_endian directive on the first line should solve the issue.

Absolutely, those are certainly the way to go! But I'm still unsure about the actual behavior on the code. Applying it to values from data directives seems ok, but when it comes to instructions (as you suggested), it gets a little more confusing.

You mention not wanting to do {value} => value[7:0] @ value[15:8] for every instruction. That's already been reduced to {value} => le(value) as of the current version.

Even so, let's imagine little-endian mode is in effect. What does {value} => 0xab @ value`16 evaluate to? Is the little-endian conversion applied to the (possibly unsized) argument, before being supplied to the expression body? Or is it only applied by the `16 slice? It can't be applied to the whole expression, since that includes endianness-independent things like the opcode, and it would mess up the concatenation order.

If it were applied directly to the argument, then something like {value: u16} => 0xab @ (value + 1)`16 wouldn't work? Because you'd be doing addition on a little-endian number, while the arithmetic operators always expect big-endian numbers. So I'm not sure what's the best way to go about it. What are your thoughts?

hlorenzi avatar Apr 11 '22 21:04 hlorenzi

You mention not wanting to do {value} => value[7:0] @ value[15:8] for every instruction. That's already been reduced to {value} => le(value) as of the current version.

I did not realize that we could do that.

If it were applied directly to the argument, then something like {value: u16} => 0xab @ (value + 1)`16 wouldn't work? Because you'd be doing addition on a little-endian number, while the arithmetic operators always expect big-endian numbers. So I'm not sure what's the best way to go about it. What are your thoughts?

Good point. Perhaps the user could do something like le(0x42 @ rY`4 @ rX`4) @ le(value`16).

TChapman500 avatar Apr 12 '22 00:04 TChapman500