HxD_DasmDataInspectorPlugin icon indicating copy to clipboard operation
HxD_DasmDataInspectorPlugin copied to clipboard

Ignored bits only ignored when not set

Open exodustx0 opened this issue 2 years ago • 5 comments

I'm writing an ISA definition which has instructions that ignore some bits. How I'm writing it now: 50,0000,FF1F,H,U,,,,"JFS ?"

The log file states a successful parsing of the instruction, yet this instruction always disassembles to "invalid". That seems like a bug to me.

I do have a temporary workaround: 50,0000,FF1F,H,U,00E0,,,"JFS ?"

This explicitly specifies the bits that are ignored (meaning that there's currently no way to define an instruction with two operands and ignored bits). This works, but should ideally not be necessary. Could the plugin be updated to handle this case automatically?

exodustx0 avatar Jun 21 '22 16:06 exodustx0

Hi. Thanks for identifying and reporting the issue you have come across.

Can you please give more information on the JFS instruction definition?

In trying to recreate the problem I have tried your original definition: 50,0000,FF1F,H,U,,,,"JFS ?"

With this, a 3 byte sequence of 50 00 00 appears to correctly? generate: JFS $0000

The difficulty I'm having trying to understand the problem is that each OperandMask is intended to extract a single Operand value only. Based on the ISA's that I have experienced so far, I have expected that a single Operand value would comprise a single consequative sequence of masked bits.
i.e. In your example case, with a 16 bit OperandBytes ("0000"), then a mask specifying a single Operand Value (bit sequence) of between 1 and 16 bits would be expected. On this basis, by providing a mask of "FF1F", a single extracted Operand value would be 16 bits long, with 3 bits of the extracted 16 bit value being masked-out (to always be zero).

Are you instead expecting that the single mask "FF1F" should equate to two seperate Operand bit sequences? i.e. An 8 bit Operand value (masked by FF00) and a seperated 5 bit Operand Value (masked by 001F).

As noted, the code will currently produce a single full 16 bit Operand, with the integral zero mask bits simply reflecting three zero bit value within the resulting 16 bit Operand. Hence why my recreation test delivered: JFS $0000

If equated as two seperate Operand bit sequences, do these two seperate Operand values have two seperate purposes? Or, are they supposed to be concatenated into a single 13 bit value (with the three zeroed bits removed), although it doesn't appear logical to have a single Operand "value" spread across multiple seperated bit sequences?

More information on the ISA involved and the full JFS instruction definition will likely answer this more clearly for me.

DigicoolThings avatar Jun 21 '22 21:06 DigicoolThings

In trying to recreate the problem I have tried your original definition: 50,0000,FF1F,H,U,,,,"JFS ?"

With this, a 3 byte sequence of 50 00 00 appears to correctly? generate: JFS $0000

Ah, yeah I hadn't realised that it works for some situations. It specifically gives "invalid" when any of the to-be-ignored bits are set, so 50 44 46 for example gives "invalid". My apologies for the improper testing.

Can you please give more information on the JFS instruction definition?

It's from the 8-bit, little endian Intel 8008 ISA. All jump and call instructions take a 14-bit address, meaning that the two most significant bits of the 2nd operand byte are ignored; the 8008 manual specifically labels these bits as "don't care". As such, when I specify a mask of FF3F (yep, my bad, hadn't spotted that error yet; my original comment assumed a 13-bit value) and no other mask for 00C0, I expect those unused bits to be ignored, regardless of whether they're set or not.

exodustx0 avatar Jun 22 '22 01:06 exodustx0

Hi,

Okay, I have now replicated what you are seeing.

I have now used the definition: 50,0000,FF3F,H,U,,,,"JFS ?" With: 50 00 46 To obtain an "invalid" response. :)

I also apologise that I'd forgotten to ask about endianess when responding previously.

Based on 8008 and Little endian, I now understand that with OperandMask "FF3F" (in byte order), we are referring to a single 14 bit Little endian Operand.

Having reviewed my code I can now see why this is happening...

Significantly: The code assumes that any bits that are not part of the FirstOperand or SecondOperand, are static bits making-up the full instruction match.

It was implemented in this way based on the various processor ISA's I had studied.

The implementation: "that any bits that are not part of the FirstOperand or SecondOperand, are static bits making-up the full instruction match", simplifies the CSV definitions. A new field would then be needed to specify an OpcodeBytesMask, the OpcodeBytes & OperandBytes would likely then best be combined into a single InstructionBytes field, and the OperandMask(s) would both need to then apply to the full instruction bytes. As I hadn't forseen the need for this (from the ISA's I'd studied), I didn't implement this additional complexity.

As a example, here are a block of four unique instructions definitions from the Motorola 6809 definition: 10A3,00,1F,H,S,,,,"CMPD ?,X" 10A3,20,1F,H,S,,,,"CMPD ?,Y" 10A3,40,1F,H,S,,,,"CMPD ?,U" 10A3,60,1F,H,S,,,,"CMPD ?,S"

In this case the actual instruction Opcodes actually extend into the top 2 bits of the 3rd byte. The 3rd byte also containing a 5 bit signed Operand.
Hence the additional most significant 3 bits of the 3rd byte (which are excluded from the Operand, by the Mask), are not in fact "don't care", but they are in fact part of the Opcode.

If I were to allow for the possibility of non-zero "don't care" bits, then the definition would most likely need to be expanded to something like: 10A300,FFFFC0,00001F,H,S,,,,,"CMPD ?,X" 10A320,FFFFC0,00001F,H,S,,,,,"CMPD ?,Y" 10A340,FFFFC0,00001F,H,S,,,,,"CMPD ?,U" 10A360,FFFFCO,00001F,H,S,,,,,"CMPD ?,S"

ie: InstructionBytes,OpcodeMask,FirstOperandMask,FirstOperandHexDec,FirstOperandSignedUnsigned,SecondOperandMask,SecondOperandHexDec,SecondOperandSignedUnsigned,AssemblyString

So, given that I hadn't come across the need for non-zero "don't care" bit handling... This ultimately raises the question of: Why the 2 bits in question would ever be Assembled (for an 8008) as non-zero values, in the real world? i.e. I understand the 8008 JFS Opcode is "50" followed by 2 bytes containing the max 14 bit destination jump address. Why would this ever specify a 15 bit or 16 bit value?

DigicoolThings avatar Jun 22 '22 03:06 DigicoolThings

Why the 2 bits in question would ever be Assembled (for an 8008) as non-zero values, in the real world? i.e. I understand the 8008 JFS Opcode is "50" followed by 2 bytes containing the max 14 bit destination jump address. Why would this ever specify a 15 bit or 16 bit value?

I'm not intricately familiar with real-world usage of the 8008, but your question did remind me of memory mirroring. Not to say that that's a thing that exists with the 8008, but as a real-world example that I do know:

The Nintendo SNES has a 24-bit address space. Depending on what kind of memory mapper a game cartridge has, certain regions of memory will mirror other regions in memory (if a cartridge has a so-called "LoROM" memory mapper, $00:8000 will mirror $80:8000). This is not a perfect example, as one can't truly state that any bit can meaningfully always be ignored (only the top halves of banks $00-7D are mirrors for banks $80-FD, as banks $7E-7F are always WRAM). But I know I've heard of applications where perfect memory mirroring exists.

Also, for a more easy real-world example, if someone were to write 8008 code to some memory and didn't write the "don't care" bits, those bits would be what they were before, which could be anything.

But I'd like to take a step back still. Essentially, the reason I'd want to see this change has nothing to do with would or would-nots. Rather, I see it such that the 8008 has a well-defined flexibility in how it handles processing operands, and the plugin currently can't follow suit. Any which way, this means that the plugin currently can't represent some completely valid states.


If I were to allow for the possibility of non-zero "don't care" bits, then the definition would most likely need to be expanded to something like: 10A300,FFFFC0,00001F,H,S,,,,,"CMPD ?,X" 10A320,FFFFC0,00001F,H,S,,,,,"CMPD ?,Y" 10A340,FFFFC0,00001F,H,S,,,,,"CMPD ?,U" 10A360,FFFFCO,00001F,H,S,,,,,"CMPD ?,S"

ie: InstructionBytes,OpcodeMask,FirstOperandMask,FirstOperandHexDec,FirstOperandSignedUnsigned,SecondOperandMask,SecondOperandHexDec,SecondOperandSignedUnsigned,AssemblyString

Sounds sensible to me, though if we're talking format changes anyway, I might as well mention that at this point I wonder whether CSV is the best format to store definitions in, especially with the optional nature of a bunch of the keys. With a format like JSON or YAML, one could specify exactly what's needed and leave the rest out, and at that the definitions could also store most definition information that is currently stored in the INI config but would, in my opinion, make more sense stored with the definition.

exodustx0 avatar Jun 23 '22 01:06 exodustx0

I agree that this should be changed. My own findings also identified some 6809 "don't care" Opcode bits, which also currently will work only in the case where they are zero! Although I don't know of a case where these bits would ever be Assembled non-zero, the fact that currently "don't care" only works when the bits are zero is indeed fundamentally not logical.

i.e. I am not happy with the current implementation not handling the case of non-zero "don't care" bits. At the end of the day, "don't care" bits should be ignored irrespective of their value (they are "don't care"!).

Unfortunately, this would involve a substantial rewrite of the core processing logic, as well as some further research to ensure endianess is still being correctly handled by the new InstructionBytes approach.

I will add this work to my to-do task list. But, given my current in-progress projects, this work is unlikely to be allocated any time soon.

Regarding CSV format, I do agree that JSON or YAML (both of which I have worked with), would certainly have some advantage in terms of self-contained definitions (including configuration information). These formats would also have enabled me to retain backward compatibility with the introduction of new keys (e.g. InstructionBytes & OpcodeMask). However, at the time, I made the decision to utilise CSV on the basis of minimising CPU Definition file size, maximising parse speed, and simplifying layman editing of the definition text files. Notably, larger CPU definitions (like the current 6809 definition), would be many times larger as JSON etc. and would add significantly to the current HxD application opening time as the CPU Definitions are initialised. Although, this is certainly something I could consider revisiting / performance testing, at a time when I am looking at the above core re-write.

I'll leave this issue open at this time, pending my workload allowing me to investigate further.

DigicoolThings avatar Jun 26 '22 04:06 DigicoolThings